One document matched: draft-zimmermann-tcp-lcd-00.txt
Internet Engineering Task Force A. Zimmermann
Internet-Draft A. Hannemann
Intended status: Experimental RWTH Aachen University
Expires: August 1, 2009 January 28, 2009
Make TCP more Robust to Long Connectivity Disruptions
draft-zimmermann-tcp-lcd-00
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 1, 2009.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
Abstract
TCP was designed with fixed, wired networks in mind. As a result TCP
performs suboptimal in networks where connectivity disruptions are
Zimmermann & Hannemann Expires August 1, 2009 [Page 1]
Internet-Draft Make TCP more Robust to LCDs January 2009
frequent, e.g., in wireless (multi-hop) networks. One reason for the
performance degradation is TCP's over-conservative behavior in face
of long connectivity disruptions.
This document describes how connectivity disruption indications
provided by standard ICMP messages may be exploited to improve TCP's
performance. An RTO revert strategy is proposed that enables earlier
detection of whether connectivity to a previously disconnected peer
node has been restored or not. The scheme is a sender only
modification which fully respects the TCP congestion control
principles.
1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
The term "acceptable acknowledgment (ACK)" in this document refers to
a TCP segment that acknowledges previously unacknowledged data (as
defined in [RFC0793]). The Transmission Control Protocol (TCP)
sender state variable "SND.UNA" and the current segment variable
"SEG.SEQ" are used as defined in [RFC0793]. SND.UNA holds the
segment sequence number of the oldest outstanding segment. SEG.SEQ
is the segment sequence number of a given segment.
2. Introduction
Connectivity disruptions can occur in many different situation. The
frequency of the connectivity disruptions depend thereby on the
property of the end-to-end path between the communicating hosts.
While connectivity disruptions can occur in traditional wired
networks too, e.g., simply due to an unplugged network cable, the
likelihood of occurrence is significant higher in wireless (multi-
hop) networks. Especially, end-host mobility and wireless
interferences are crucial factors. In the case the hosts use the
Transmission Control Protocol (TCP) [RFC0793] for their
communication, the performance of the connection can exhibit a
significant reduction compared to a permanently connected path
[SESB05].
According to Schuetz et. al. [I-D.schuetz-tcpm-tcp-rlci]
connectivity disruptions can be classified into two groups: "short"
and "long" connectivity disruptions. A connectivity disruption is
short if connectivity returns before the retransmission timeout (RTO)
fires for the first time. In this case, TCP recovers lost data
Zimmermann & Hannemann Expires August 1, 2009 [Page 2]
Internet-Draft Make TCP more Robust to LCDs January 2009
segments through Fast Retransmit and lost ACKs through successfully
delivered later ACKs. Connectivity disruptions are declared as long
for a given TCP connection, if the RTO fires at least once before
connectivity returns. Whether or not path characteristics have
changed when the connectivity returns after a disruption is second
important aspect for TCP's retransmission scheme
[I-D.schuetz-tcpm-tcp-rlci].
This memo will focus on TCP's behavior in face of long connectivity
disruptions in the time "before" connectivity is restored. Moreover,
this document does not describe any additional optimization to detect
if the path characteristics remain unchanged. Therefore, TCP's RTO
based Loss Recovery and in particular Slow-Start [RFC2581] will be
unchanged.
When a long connectivity disruption occurs on path between two
communicating hosts, the TCP sender stops receiving ACKs. After
expiration of the RTO the TCP sender will repeatedly retransmit the
first unacknowledged segment (SND.UNA) until it is successfully
acknowledged. TCP implementations that follow the recommended RTO
management proposed in RFC 2988 [RFC2988] double the RTO value after
each retransmission attempt. However, the RTO growth may be bounded
by an upper limit maximum RTO, which is at least 60s, but may be
longer: Linux for example uses 120s. If the connectivity is restored
between two retransmission attempts, a TCP still have to wait until
the RTO expires before resuming transmission, since TCP simply does
not have any means to know that the connectivity is re-established.
Therefore, depending on when connectivity becomes available again,
this can waste up to maximum RTO of possible transmission time.
This retransmission behavior is not efficient, especially in
scenarios or networks like wireless (multi-hop) networks where
connectivity disruptions are frequent. In the ideal case, TCP would
attempt a retransmission as soon as connectivity to its peer was re-
established. In this document a method how the standard Internet
Control Message Protocol (ICMP) can be exploited to improve TCP's
performance is described. The presented scheme is a sender only
modification, i.e., neither intermediate routers nor the TCP receiver
have to be modified. Furthermore, the proposed modification
approaches the ideal behavior, if the network allows for it (i.e., no
congestion is present). By an RTO revert strategy, higher-frequency
retransmissions can be realized to enable earlier detection of
whether connectivity to a previously disconnected peer node has been
restored.
Zimmermann & Hannemann Expires August 1, 2009 [Page 3]
Internet-Draft Make TCP more Robust to LCDs January 2009
3. Connectivity Disruption Indication
As long as the queue of a router experiencing a link outage is deep
enough, i.e., it can buffer all incoming packets, a connectivity
disruption will only cause variation in delay which is handled well
by a contemporary TCP with the help of Eifel [RFC3522] or forward RTO
(F-RTO) [RFC4138]. However, if the link outage lasts too long, the
router experiencing the link outage is forced to drop packets and
finally to discard the according route. Means to detect such link
outages comprise reacting on failed address resolution protocol (ARP)
queries, unsuccessful link sensing, and the like. However, this is
solely in the responsibility of the respective router.
Note: The focus of this memo is on introducing a method how ICMP
messages may be exploited to improve TCP's performance; how
different physical-and link layer mechanisms underneath the
network layer may trigger ICMP destination unreachable messages
are out of scope of this memo.
The removal of the route usually goes along with a notification to
the corresponding TCP source about the dropped packets via ICMP
destination unreachable messages of code 0 (net unreachable) or code
1 (host unreachable) [RFC1812]. Therefore, since ICMP destination
unreachable messages of these codes are evidence that packets were
dropped due to a link outage, they can be interpreted as a
connectivity disruption indication.
Note that there are also other ICMP destination unreachable messages
with different codes. Some of them are candidates for connectivity
disruption indications too, but need further investigation. For
example ICMP destination unreachable messages with code 5 (source
route failed), code 11 (net unreachable for TOS), or code 12 (host
unreachable for TOS). On the other side codes that flag hard errors
[RFC1122] are of no use for the proposed scheme. In the following,
the term "ICMP unreachable message" is used as synonym for ICMP
destination unreachable messages of code 0 or code 1.
A router experiencing a link outage is an obvious candidate for being
heavily congested because it is not just unable to forward packets
fast enough, it is even unable to forward packets at all. Therefore,
TCP's exponential back-off may seem appropriate. However, taking
into account the congestion control principles [RFC2914], i.e.,
congestion is indicated by packet loss, receiving an ICMP unreachable
message might be an indication that there is no congestion. For
instance, when a (re-)transmission is replied to with an ICMP
unreachable message, this is a strong indication that there is no
congestion in the network - at least on that very part of the path
which was traveled by both, the TCP segment eliciting the ICMP
Zimmermann & Hannemann Expires August 1, 2009 [Page 4]
Internet-Draft Make TCP more Robust to LCDs January 2009
unreachable message as well as the ICMP unreachable message itself.
Therefore, it seems a little bit harsh for TCP to back-off as if
there was true congestion.
The accurate interpretation of ICMP unreachable messages as an
connectivity disruption indication is complicated by the following
two peculiarities of ICMP messages. Firstly, they do not necessarily
operate on the same timescale as the packets, i.e., in the given case
TCP segments, which elicited them. When a router drops a packet due
to a missing route it will not necessarily send an ICMP unreachable
message immediately, but rather queues it for later delivery.
Secondly, ICMP messages are subject to rate limiting, e.g., when a
router drops a whole window of data due to a link outage, it will
hardly send as many ICMP unreachable messages as it dropped TCP
segments. Depending on the load of the router it may even send no
ICMP unreachable messages at all. Both peculiarities originate from
RFC 1812 [RFC1812].
Fortunately, according to RFC 792 [RFC0792] ICMP unreachable messages
are obliged to contain in their body the Internet Protocol (IP)
header of the datagram eliciting the ICMP unreachable messages plus
the first 64 bits of the payload of that datagram, i.e., in case of a
TCP segment both port numbers and the sequence number. This allows
the originating TCP to identify the connection which an ICMP
unreachable message is reporting an error about. Moreover, it allows
the originating TCP to identify which segment of the respective
connection triggered the ICMP unreachable message, provided that
there are not several segments in flight with the same sequence
number. This may very well be the case when TCP is recovering lost
segments.
4. Connectivity Disruption Reaction
The complete algorithm is specified in Section 4.1. In section
Section 4.2, the different steps of the algorithm are discussed in
detail.
4.1. The Algorithm
The following scheme MAY be used by a TCP sender to avoid over-
conservative back-offs of the retransmission timer in the case of
long connectivity disruptions:
(1) Set a "UndoBackOff" variable to UNPROVED (equal 0)
UndoBackOff := UNPROVED.
Zimmermann & Hannemann Expires August 1, 2009 [Page 5]
Internet-Draft Make TCP more Robust to LCDs January 2009
(2) Wait for the expiration the retransmission timer, proceed to
step (RTO).
(3) Wait either
for the arrival of an acceptable ACK. When an acceptable ACK
has arrived, proceed to step (ACK),
or for the arrival of an ICMP destination unreachable
message. When ICMP destination unreachable message has
arrived, proceed to step (4),
or for the expiration the retransmission timer, proceed to
step (RTO).
(4) Extract the TCP segment header included in the ICMP destination
unreachable message
SEG := Extract(ICMP_MSG).
(5) If "SEG.SEQ == SND.UNA", i.e., ICMP unreachable message reports
on a retransmission, then
If "UndoBackOff == UNPROVED", then set the "UndoBackOff"
variable to PROVED (equal 1)
UndoBackOff := PROVED.
else revert one RTO back-off
RTO := max(MINIMUM_RTO, RTO / 2).
(6) Proceed to step (3).
(RTO) This is a placeholder for the standard TCP behavior that must
be executed at this point in the case the retransmission timer
is expired. Proceed to step (3).
(ACK) This is a placeholder for the standard TCP behavior that must
be executed at this point in the case an acceptable ACK is
arrived. Proceed to step (1).
4.2. The Algorithm in Detail
When an RTO expires a TCP marks all outstanding segments as lost,
sets the congestion window (CWND) to one segment, back-offs the RTO,
and retransmits the first unacknowledged segment SND.UNA (step 2).
If the RTO expires again a TCP will repeat the retransmission of the
Zimmermann & Hannemann Expires August 1, 2009 [Page 6]
Internet-Draft Make TCP more Robust to LCDs January 2009
first unacknowledged segment and back-off again (step 3c). This
pattern will be repeated as long as no packet arrives or until the
maximum RTO expired.
If the first received packet after the retransmission(s) is an
acceptable ACK (step 3a), a TCP will proceed as normal, i.e., slow-
start the connection. It ignores later ICMP unreachable messages
from the window of data which experienced RTO. Late ICMP unreachable
messages are of no use as the ACK clock is already restarting due to
the successful retransmission.
On the other side if the first received packet after the
retransmission(s) is an ICMP unreachable message, a TCP SHOULD revert
one back-off for each ICMP unreachable message reporting an error on
a retransmission. To decide if an ICMP unreachable message reports
on a retransmission, the sequence number therein is exploited (step
4, step 5).
Nevertheless, the first unacknowledged sequence number is suffering
from the ambiguity if it refers to the original transmission or to
any of the retransmissions. To be conservative, it should be
considered to belong to the original transmission (step 5a).
However, for each next ICMP unreachable message reporting on the
retransmission, TCP SHOULD revert one back-off (step 5b).
Upon receipt of an ICMP unreachable message which legitimately
reverts one back-off there is the possibility that this new RTO has
expired already. Then, a TCP SHOULD retransmit immediately, i.e., an
ICMP message clocked retransmission. In case the new RTO has not
expired yet, TCP MUST wait accordingly.
5. Discussion
Apart from the possibility to receive ICMP unreachable messages
reporting on the sequence number of the retransmission, there might
as well arrive ICMP unreachable messages reporting on the original
window of data while a TCP is in RTO induced recovery. As TCP cannot
decide by a single or a few ICMP unreachable messages if the whole
window of data was dropped because of a link outage, there is the
option that at least one of the segments was dropped due to true
congestion in the network, calling for back-off. Therefore, to be
conservative, a TCP MUST NOT revert the back-off in such a case (step
5a). Although, there is still the unlikely possibility that the
intermediate router indeed sends an ICMP unreachable message for each
dropped segment. Then, TCP should be allowed to even revert the
first back-off. However, as this case is very unlikely and requires
one more state variable to detect it is not recommended in this
Zimmermann & Hannemann Expires August 1, 2009 [Page 7]
Internet-Draft Make TCP more Robust to LCDs January 2009
document.
Besides the ambiguity if the first unacknowledged sequence number
refers to the original transmission or to any of the retransmissions,
there is another source of ambiguity about the sequence numbers
contained in the ICMP unreachable messages. For high bandwidth paths
like modern gigabit links the sequence space may wrap rather quickly,
thereby allowing the possibility that a late ICMP unreachable message
reporting on an old error may coincidentally fit as input in the
scheme explained above. As a result, the scheme would wrongly revert
one back-off. However, chances for this to happen are minuscule.
Moreover, as the scheme is tailored most conservatively no threat to
the network from this issues may arise.
Finally, the scheme explicitly does not call for a differentiation of
ICMP unreachable messages originating from different routers, as the
evidence of no congestion still holds even if the reporting router
changed.
Another exploitation of ICMP unreachable messages in the context of
TCP congestion control might seem appropriate in case the ICMP
unreachable message is received while TCP is in steady-state and the
message refers to a segment from within the current window of data.
As the round trip time (RTT) up to the router which generates the
ICMP unreachable message is likely to be substantially shorter than
the overall RTT to the destination, the ICMP unreachable message may
very well reach the originating TCP while it is transmitting the
current window of data. In case the remaining window is large, it
might seem appropriate to refrain from transmitting the remaining
window as there is timely evidence that it will only trigger further
ICMP unreachable messages at the very router. Although this might
seem appropriate from a wastage perspective, it may be
counterproductive from a security perspective since ICMP messages are
easy to spoof, thereby allowing an easy attack to the TCP by simply
forging such ICMP messages.
An additional consideration is the following: in the presence of
multi-path routing even the receipt of a legitimate ICMP unreachable
message cannot be exploited accurately because there is the option
that only one of the multiple paths to the destination is suffering
from a connectivity disruption which causes ICMP unreachable messages
to be sent. Then however, there is the possibility that the path
along which the connectivity disruption occurred contributed
considerably to the overall bandwidth, such that a congestion
response is very well reasonable. However, this is not necessarily
the case. Therefore, a TCP has no means except for its inherent
congestion control to decide on this matter. All in all, it seems
that for a connection in steady-state, i.e., not in RTO induced
Zimmermann & Hannemann Expires August 1, 2009 [Page 8]
Internet-Draft Make TCP more Robust to LCDs January 2009
recovery, reacting on ICMP unreachable messages in regard to
congestion control is not appropriate. For the case of RTO-based
retransmissions, however, there is a reasonable congestion response,
which is skipping further back-off of the RTO because there is no
congestion indication - as described above.
6. Related Work
In literature there are several methods, which address TCP's problems
in the presence of connectivity disruptions. Some of them try to
improve TCP's performance by modifying the lower layers. For example
[SM03] introduces a "smart link layer" that buffers one segment for
each ongoing connection and replaying these segments on connectivity
reestablishment. This approach has a serious drawback: previously
state-less intermediate routers have to be modified in order to
inspect TCP headers, track the end-to-end connection and to provide
additional buffer space that lead all in all to an additional need of
memory and processing power.
On the other hand stateless link layer schemes, like proposed in
RFC 3819 [RFC3819], which unconditionally buffer some small number of
packets may have another problem: if a packet is buffered longer than
the maximum segment lifetime (MSL) of [RFC0793] 2 min, i.e., the
disconnection lasts longer than MSL, TCP's assumption that such
segments will never be received will no longer be true, violating
TCP's semantics [I-D.eggert-tcpm-tcp-retransmit-now].
Other approaches like TCP-F [CRVP01] or the Explicit Link Failure
Notification (ELFN) [HV02] inform the TCP senders about disrupted
paths by special messages generated from intermediate routers. In
case of a link failure they stop sending data segments and freeze
TCP's retransmission timers. TCP-F stays in this state and remains
silent until either a "route establishment notification" is received
or an internal timer expires. In contrast, ELFN periodically probes
the network to detect connectivity reestablishment. Both proposals
rely on changes to intermediate routers, whereas the scheme proposed
in this memo is a sender only modification. Moreover, ELFN also does
not consider congestion in the network and may impose serious
additional load on the network, depending on the probe interval.
The authors of ATCP [LS01] propose enhancements to identify different
types of packet loss, by introducing a layer between TCP and IP.
They utilize ICMP destination unreachable messages to set TCP's
receiver advertised window to zero and thus forcing the TCP sender to
do zero window probing with exponential back-off. ICMP destination
unreachable messages, which arrive during this probing period, are
ignored. This approach is nearly orthogonal to this memo, which
Zimmermann & Hannemann Expires August 1, 2009 [Page 9]
Internet-Draft Make TCP more Robust to LCDs January 2009
exploits ICMP messages to revert a RTO back-off, when TCP is already
probing. In principle both mechanisms could be combined, however,
due to security considerations it does not seem appropriate to adopt
ATCP's reaction as discussed in Section 5.
Schuetz et al. describe in [I-D.schuetz-tcpm-tcp-rlci] a set of TCP
extensions that improve behavior when transmitting over paths whose
characteristics can change on short time-scales. Their proposed TCP
extensions modify the local behavior of TCP and introduce a new TCP
option to signal locally received connectivity-change indications
(CCIs) to remote peers. Upon reception of a CCI, they re-probe the
path characteristics either by performing a speculative
retransmission or by sending a single segment of new data, depending
on whether the connection is currently in the loss state or
transmitting in steady-state, respectively. The authors focus on
specifying TCP response mechanisms, nevertheless underlying layers
would have to be modified to explicitly send CCIs to make these
immediate responses possible.
7. IANA Considerations
This memo includes no request to IANA.
8. Security Considerations
The proposed mechanism is considered to be secure. For example an
attacker cannot make a TCP modified with proposed scheme flood the
network just by sending forged ICMP unreachable messages reverting
RTO back-offs. Even in the case the attacker could correctly guess
the sequence number of the current retransmitted segment, the
retransmission frequency is limited by the minimum value for the RTO
of 1s specified by RFC 2988 [RFC2988].
9. References
9.1. Normative References
[RFC0792] Postel, J., "Internet Control Message Protocol", STD 5,
RFC 792, September 1981.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981.
[RFC1812] Baker, F., "Requirements for IP Version 4 Routers",
RFC 1812, June 1995.
Zimmermann & Hannemann Expires August 1, 2009 [Page 10]
Internet-Draft Make TCP more Robust to LCDs January 2009
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999.
[RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000.
9.2. Informative References
[CRVP01] Chandran, K., Raghunathan, S., Venkatesan, S., and R.
Prakash, "A feedback-based scheme for improving TCP
performance in ad hoc wireless networks", IEEE Personal
Communications vol. 8, no. 1, pp. 34-39, February 2001.
[HV02] Holland, G. and N. Vaidya, "Analysis of TCP performance
over mobile ad hoc networks", Wireless Networks vol. 8,
no. 2-3, pp. 275-288, March 2002.
[I-D.eggert-tcpm-tcp-retransmit-now]
Eggert, L., "TCP Extensions for Immediate
Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02
(work in progress), June 2005.
[I-D.schuetz-tcpm-tcp-rlci]
Schuetz, S., Koutsianas, N., Eggert, L., Eddy, W., Swami,
Y., and K. Le, "TCP Response to Lower-Layer Connectivity-
Change Indications", draft-schuetz-tcpm-tcp-rlci-03 (work
in progress), February 2008.
[LS01] Liu, J. and S. Singh, "ATCP: TCP for mobile ad hoc
networks", IEEE Journal on Selected Areas in
Communications vol. 19, no. 7, pp. 1300-1315, 2001 July.
[RFC1122] Braden, R., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2914] Floyd, S., "Congestion Control Principles", BCP 41,
RFC 2914, September 2000.
[RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
for TCP", RFC 3522, April 2003.
[RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
Wood, "Advice for Internet Subnetwork Designers", BCP 89,
RFC 3819, July 2004.
Zimmermann & Hannemann Expires August 1, 2009 [Page 11]
Internet-Draft Make TCP more Robust to LCDs January 2009
[RFC4138] Sarolahti, P. and M. Kojo, "Forward RTO-Recovery (F-RTO):
An Algorithm for Detecting Spurious Retransmission
Timeouts with TCP and the Stream Control Transmission
Protocol (SCTP)", RFC 4138, August 2005.
[SESB05] Schuetz, S., Eggert, L., Schmid, S., and M. Brunner,
"Protocol enhancements for intermittently connected
hosts", SIGCOMM Computer Communication Review vol. 35, no.
3, pp. 5-18, December 2005.
[SM03] Scott, J. and G. Mapp, "Link layer-based TCP optimisation
for disconnecting networks", SIGCOMM Computer
Communication Review vol. 33, no. 5, pp. 31-42,
October 2003.
Authors' Addresses
Alexander Zimmermann
RWTH Aachen University
Ahornstrasse 55
Aachen, 52074
Germany
Phone: +49 241 80 21422
Email: zimmermann@cs.rwth-aachen.de
Arnd Hannemann
RWTH Aachen University
Ahornstrasse 55
Aachen, 52074
Germany
Phone: +49 241 80 21423
Email: hannemann@nets.rwth-aachen.de
Zimmermann & Hannemann Expires August 1, 2009 [Page 12]
| PAFTECH AB 2003-2026 | 2026-04-21 21:25:46 |