One document matched: draft-sabatini-tcp-sack-00.txt
Network Working Group A. Sabatini
Internet-Draft Broker Communications Inc.
Intended Status: Standards Track .
Expires: September 1, 2012 February 28, 2012
Highly Efficient Selective Acknowledgement (SACK) for TCP
draft-sabatini-tcp-sack-00
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on September 1, 2012.
Comments are solicited and should be addressed to the author at
draft-sack@tsabatini.com.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Abstract
Sabatini, Anthony Expires September 1, 2012 [Page 1]
Internet Draft High Efficiency SACK for TCP February 28, 2012
This memo expands on the Selective Acknowledgement Protocol described
in RFC2018 to improve its performance and efficiency while reducing
the delay involved in recovering lost segments. This leads to very
reliable and efficient communications regardless of transit delay or
high levels of lost segments due to noise or congestion. It
introduces a fundamentally new way of looking at Selective
Acknowledgement and uses this concept to improve the performance of
the RFC2018 protocol. This memo proposes an implementation of the
improved SACK and discusses its performance and related issues.
Acknowledgements
Much of the text in this document is taken directly from RFC2018 "TCP
Selective Acknowledgement Options" by M. Mathis, J. Mahdavi, S.floyd
and A. Romanow and RFC1072 "TCP Extensions for Long-Delay Paths" by
B. Braden and V. Jacobson.
1. Introduction
This revision to the SACK protocol has its roots in a similar, HDLC
based protocol I designed and implemented for secure financial
transactions. That protocol, being designed for use on a worldwide
basis, was born out of the need for a protocol that would handle any
communications environment no matter how noisy or how much delay
(including multiple satellite hops) was in the path. In later years
its properties were found valuable in congestion situations where
packets were dropped.
Multiple packet losses from a window of data can have a catastrophic
effect on TCP throughput. TCP [Postel81] uses a cumulative
acknowledgment scheme in which received segments that are not at the
left edge of the receive window are not acknowledged. This forces
the sender to either wait a roundtrip time to find out about each
lost packet, or to unnecessarily retransmit segments which have been
correctly received [Fall95]. With the cumulative acknowledgment
scheme, multiple dropped segments generally cause TCP to lose its
ACK-based clock, reducing overall throughput.
Selective Acknowledgment (SACK) is a strategy which corrects this
behavior in the face of multiple dropped segments. With selective
acknowledgments, the data receiver can inform the sender about all
segments that have arrived successfully, so the sender need
retransmit only the segments that have actually been lost.
I propose modifications to the SACK options as proposed in RFC2018.
Specifically, I add a transmit state to each transmitted message and
return that transmit state when each acknowledgement is sent. By
using the returned transmit state I can tell what messages have been
Sabatini, Anthony Expires September 1, 2012 [Page 2]
Internet Draft High Efficiency SACK for TCP February 28, 2012
transmitted after the information in the acknowledgement and thus
rebuild the state of the receiver at the transmitter. I also propose
changes to the way SACK blocks are reported to insure that the
oldest, and thus the most critical, are transmitted expeditiously.
Additionally since the space to store acknowledgements in IPV4 is
limited and may not be able to accommodate all of the acknowledgement
pairs, I propose a method of sending the complete receiver state by
sending multiple acknowledgements.
The RFC2018 selective acknowledgment extension uses two TCP options.
The first is an enabling option, "SACK-permitted", which may be sent
in a SYN segment to indicate that the SACK option can be used once
the connection is established. This option is extended to both
indicate that this newer version of the protocol is being used and to
establish an initial value for transmit state. The other is the SACK
option itself, which may be sent over an established connection once
permission has been given by SACK-permitted. This has also been
extended to add both the transmit state implicit in the message and
the transmit state that was received at the far end (now called
"Received State").
The SACK option is to be included in a segment sent from a TCP that
is receiving data to the TCP that is sending that data; we will refer
to these TCP's as the data receiver and the data sender,
respectively. We will consider a particular simplex data flow; any
data flowing in the reverse direction over the same connection can be
treated independently.
2. Underlying concepts
In order for a sender to know how to optimally transmit messages to a
receiver the sender must recreate the state of the receiver as of the
last acknowledgement received (which segments have been recieved and
acknowledged, which segments have not) and then "age" or modify that
state by updating it based upon the messages transmitted since the
state implicit in the acknowledgement was current. In order to do
this the sender must maintain a transmission order buffer which lists
the segment ranges of each message as it is sent. We called the
index into the transmission order buffer "Send state" and transmitted
this state variable with each message. The receiver, after correctly
receiving the message, saves this value and returns it (now called
"receive state") and the list of selectively acknowledged segments
with each acknowledgement. When the sender receives this information
it is then capable of constructing a list of missing segments by
taking its unacknowledged segment range list and modifying it on the
basis of the received selective acknowledgements and then removing
from that list all segments that have been transmitted since the
message which caused the acknowledgement which is all segments sent
Sabatini, Anthony Expires September 1, 2012 [Page 3]
Internet Draft High Efficiency SACK for TCP February 28, 2012
with indexes between the current "send state" and the "receive state"
in the acknowledgement message.
To accommodate the issue of receiving segments out of order at the
receiver, or those packets delayed by alternate routing, the reciever
does not instantly update the received state value (which could
trigger a false retransmission) but rather puts it on a timer queue
for a length of time appropriate to the delay randomness in the
arrival path (typically 40 to 200 ms based on media, speed and
distance), which when the timer entry expires, causes the update of
the recieved state value. If the recieved state value when returned
to the sender and processed shows blocks that remain unacknowledged
after this time out they are assumed to be lost and they are queued
for retransmission.
Thus by transmitting the complete acknowledgement information (SACK
blocks) from the receiver along with an indicator to the sender as to
its state current at the time of the acknowledgement the sender can
accurately recreate the current status of the receiver assuming all
"in flight" messages were received and thus only send the
unacknowledged messages starting with the oldest along with any new
messages whose retransmission is requested.
3. Sack-Permitted Option
This six-byte option may be sent in a SYN by a TCP that has been
extended to receive (and presumably process) the Improved SACK
option.
The presence of the additional four bytes differentiates the Improved
SACK from the earlier protocol. Although Receive Status serves no
function and is MUST be coded as 0 currently, it is left to further
study whether it can be utilized in link reconnection after failure.
This option MUST NOT be transmitted on non-SYN segments in the
current protocol, it is left to future study as to its use for
transmitting long sequences of acknowledgements in one frame.
TCP Sack-Permitted Option:
Kind: 4
+--------+--------+
| Kind=4 |Length=6|
+--------+--------+--------+--------+
| Send State | Receive State |
+--------+--------+--------+--------+
Sabatini, Anthony Expires September 1, 2012 [Page 4]
Internet Draft High Efficiency SACK for TCP February 28, 2012
4. Sack Option Format
The SACK option is to be used to convey extended acknowledgment
information from the receiver to the sender over an established TCP
connection.
TCP SACK Option:
Kind: 5
Length: Variable
+--------+--------+
| Kind=5 | Length |
+--------+--------+--------+--------+
| Send State | Receive State |
+--------+--------+--------+--------+
| Left Edge of 1st Block |
+--------+--------+--------+--------+
| Right Edge of 1st Block |
+--------+--------+--------+--------+
| |
/ . . . /
| |
+--------+--------+--------+--------+
| Left Edge of nth Block |
+--------+--------+--------+--------+
| Right Edge of nth Block |
+--------+--------+--------+--------+
The SACK option is to be sent by a data receiver to inform the data
sender of non-contiguous blocks of data that have been received and
queued. The data receiver awaits the receipt of data (perhaps by
means of retransmissions) to fill the gaps in sequence space between
received blocks. When missing segments are received, the data
receiver acknowledges the data normally by advancing the left window
edge in the Acknowledgement Number Field of the TCP header. The SACK
option does not change the meaning of the Acknowledgement Number
field.
This option contains a list of some of the blocks of contiguous
sequence space occupied by data that has been received and queued
within the window.
Each contiguous block of data queued at the data receiver is defined
in the SACK option by two 32-bit unsigned integers in network byte
order:
Sabatini, Anthony Expires September 1, 2012 [Page 5]
Internet Draft High Efficiency SACK for TCP February 28, 2012
* Left Edge of Block
This is the first sequence number of this block.
* Right Edge of Block
This is the sequence number immediately following the last
sequence number of this block.
Each block represents received bytes of data that are contiguous and
isolated; that is, the bytes just below the block, (Left Edge of
Block - 1), and just above the block, (Right Edge of Block), have not
been received.
A SACK option that specifies n blocks will have a length of 8*n+6
bytes, so the 40 bytes available for TCP options can specify a
maximum of 4 blocks. It is suggested that the Improved SACK will
provide the timestamp information used for RTTM [Jacobson92].
5. Generating Sack Options: Data Receiver Behavior
If the data receiver has received a SACK-Permitted option on the SYN
for this connection, the data receiver MAY elect to generate SACK
options as described below. If the data receiver generates SACK
options under any circumstance, it MUST generate them under all
permitted circumstances. If the data receiver has not received a
SACK-Permitted option for a given connection, it MUST NOT send SACK
options on that connection.
If sent at all, SACK options MUST be included in all ACKs which do
not ACK the highest sequence number in the data receiver's queue. In
this situation the network has lost or mis-ordered data, such that
the receiver holds non-contiguous data in its queue. RFC 1122,
Section 4.2.2.21, discusses the reasons for the receiver to send ACKs
in response to additional segments received in this state. The
receiver MUST send an ACK for every valid segment that arrives
containing new data, and each of these "duplicate" ACKs SHOULD bear a
SACK option.
The purpose of the SACK blocks is to recreate the status of the
receiver at the transmitter. To that end the most important
information is (1) new or changed blocks, (2) the second transmission
of new or changed blocks, (3) a complete enumeration of all received
blocks starting from the oldest first. Since the SACK option field
may not have enough space for all blocks outstanding the receiver
will continue to issue acknowledgements until all blocks are
transmitted. In order to implement the SACK option a flag must be
kept with each block indicating whether it has been sent a second
Sabatini, Anthony Expires September 1, 2012 [Page 6]
Internet Draft High Efficiency SACK for TCP February 28, 2012
time.
If the data receiver chooses to send a SACK option, the following
rules apply:
* The data receiver first fills in "Send State" in the option from
the current value of its "Send State". The data receiver then
fills in "Receive State" from the "Send State" of the SACK option
of the last TCP packet received that has cleared the delay queue.
* The first SACK block (i.e., the one immediately following the
kind and length fields in the option) MUST specify the contiguous
block of data containing the segment which triggered this ACK,
unless that segment advanced the Acknowledgment Number field in
the header. This assures that the ACK with the SACK option
reflects the most recent change in the data receiver's buffer
queue.
* The data receiver SHOULD include as many distinct SACK blocks as
possible in the SACK option. Note that the maximum available
option space may not be sufficient to report all blocks present in
the receiver's queue.
* The second SACK block SHOULD be filled out by repeating the most
recently reported SACK block (based on first SACK blocks in
previous SACK options) that are not subsets of a SACK block
already included in the SACK option being constructed and if it
has not previously been retransmitted. This assures that in
normal operation, any segment remaining part of a non-contiguous
block of data held by the data receiver is reported in at least
two successive SACK options, even for large-window TCP
implementations [RFC1323]).
* Subsequent SACK blocks SHOULD be filled with other outstanding
SACK blocks on the list, cycling from the earliest to the latest
and then starting again with the earliest. Whenever the the list
changes sufficient acknowledgements must be sent to insure that
all SACK blocks are transmitted.
* Upon any change to the recieved state value, if the reciever is
not currently transmiting data or ACK packets, the reciever will
initiate sending sufficient data or ACK packets to completely
transmit its complete SACK block list based on the rules above.
* A timer is maintained that is one quarter of the expected round
trip delay (typically 250 mS). This timer is set when the last
acknowledgement is transmitted by the receiver. At the expiration
Sabatini, Anthony Expires September 1, 2012 [Page 7]
Internet Draft High Efficiency SACK for TCP February 28, 2012
of this timer if there are still segments that have not been
retransmitted the receiver again sends sufficient acknowledgements
to completely transmit all current SACK blocks.
6. Interpreting the Sack Option and Retransmission Strategy: Data
Sender Behavior
When receiving an ACK containing a SACK option, the data sender MUST
record the selective acknowledgment for future reference. The data
sender is assumed to have a retransmission queue that contains the
segments that have been transmitted but not yet acknowledged, in
sequence-number order. If the data sender performs re-packetization
before retransmission, the block boundaries in a SACK option that it
receives may not fall on boundaries of segments in the retransmission
queue; however, this does not pose a serious difficulty for the
sender.
One possible implementation of the sender's behavior is as follows.
Upon receiving an acknowledgement the sender first eliminates all
saved SACK blocks from the list which have now been acknowledged by
the TCP header. The sender then adds the SACK blocks from the
current acknowledgement into SACK block list, eliminating any that
have been combined. The sender then constructs a list of
unacknowledged blocks by creating a block for each gap in sequence.
The sender then takes the received state from the message and uses
the list of blocks that have been transmitted since that state was
generated to delete members on the unacknowledged list. The sender
finally sets the updated unacknowledged list as the list of blocks to
be sent, oldest first.
After a retransmit timeout the data sender SHOULD delete all saved
SACK blocks, since under normal circumstances the acknowledgements
from the other end should have prevented the timeout. The data
sender MUST start the retransmit with the segment at the left edge of
the window after a retransmit timeout. A segment will not be
dequeued and its buffer freed until the left window edge is advanced
over it.
6.1 Congestion Control Issues
This document does not attempt to specify in detail the congestion
control algorithms for implementations of TCP with SACK. However,
the congestion control algorithms present in the de facto standard
TCP implementations MUST be preserved [Stevens94]. This algorithim
eliminates much unnecessary retransmission so is likely to lessen
overall congestion.
Sabatini, Anthony Expires September 1, 2012 [Page 8]
Internet Draft High Efficiency SACK for TCP February 28, 2012
The use of time-outs as a fall-back mechanism for detecting dropped
packets is unchanged by the SACK option. Because in normal operation
acknowledgements will prevent retransmit timeout, when a retransmit
timeout occurs the data sender MUST ignore prior SACK information in
determining which data to retransmit.
Future research into congestion control algorithms may take advantage
of the additional information provided by SACK. One such area for
future research concerns modifications to TCP for a wireless or
satellite environment where packet loss is not necessarily an
indication of congestion.
7. Efficiency and Worst Case Behavior
Although this high efficiency improved SACK option sends more and
larger SACK blocks and more acknowledgements than the previous
version, with an active bi-directional link additional
acknowledgements are often associated with data transmission and thus
not a penalty. If the SACK option needs to be used due to segment
loss then the improved efficiency afforded with this protocol more
than justifies the additional SACK blocks.
The deployment of other TCP options may reduce the number of
available SACK blocks to 2 or even to 1. This will reduce the
redundancy of SACK delivery in the presence of lost ACKs. Even so,
the exposure of TCP SACK in regard to the unnecessary retransmission
of packets is strictly less than the exposure of current
implementations of TCP. The worst-case conditions necessary for the
sender to needlessly retransmit data is discussed in more detail in a
separate document [Floyd96].
Older TCP implementations which do not have the SACK option will not
be unfairly disadvantaged when competing against SACK-capable TCPs.
This issue is discussed in more detail in [Floyd96].
8. Timestamping
One pleasant benefit of having a token which is returned by the far
end on a determineistic basis is the easy calculation of round trip
delay. We can save a time stamp along with the segment information
in our transmission order array. This allows us to calculate round
trip delay when we receive our "Receive State" value and use it to
access the timestamp. Since more than one received message might
have the same "Receive State" value we zero the timestamp after use
to indicate that the value should not be used again. Note that if an
acknowledgement is lost we will calculate a longer delay than is
accurate therefore we must smooth the returned values, typically
returning the smallest out of the last N where N is typically four.
Sabatini, Anthony Expires September 1, 2012 [Page 9]
Internet Draft High Efficiency SACK for TCP February 28, 2012
9. Data Receiver Reneging
Since the Sender is recreating the state of the Receiver, the data
Receiver MUST NOT discard data in its queue once that data has been
reported in a SACK option. The Receiver is responsible for
allocating enough buffers so that the missing segments within the
window may be properly received and processed.
10. Security Considerations
This document neither strengthens nor weakens TCP's current security
properties.
11. References
[Jacobson88}, Jacobson, V. and R. Braden, "TCP Extensions for Long-
Delay Paths", RFC 1072, October 1988.
[Jacobson92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
for High Performance", RFC 1323, May 1992.
[Postel81] Postel, J., "Transmission Control Protocol - DARPA
Internet Program Protocol Specification", RFC 793, DARPA, September
1981.
Author's Address
Anthony Sabatini
Broker Communications Inc.
200 West 20th Street
Suite 1216
New York, NY 10011
Email: draft-sack@tsabatini.com
The author is currently a master's degree candidate at -
Hofstra University
Hempstead, N.Y.
His adviser is Dr. Xiang Fu
Sabatini, Anthony Expires September 1, 2012 [Page 10]
| PAFTECH AB 2003-2026 | 2026-04-23 11:28:28 |