One document matched: draft-templin-linkadapt-02.txt
Differences from draft-templin-linkadapt-01.txt
Network Working Group F. Templin, Ed.
Internet-Draft Boeing Phantom Works
Expires: September 4, 2006 March 3, 2006
Link Adaptation for IPv6-in-IPv4 Tunnels
draft-templin-linkadapt-02.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 4, 2006.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
IPv6-in-IPv4 tunnel endpoints support an MTU of 1280 bytes or larger
via static prearrangements and/or dynamic MTU determination based on
ICMPv4 messages, but these methods have known operational
limitations. This document proposes a new MTU determination
mechanism for IPv6-in-IPv4 tunnels that supports larger MTUs using a
link adaptation scheme with tunnel endpoint-based segmentation/
reassembly and dynamic segment size probing.
Templin Expires September 4, 2006 [Page 1]
Internet-Draft Link Adaptation for Tunnels March 2006
1. Introduction
IPv6-in-IPv4 tunnels span multiple IPv4 network hops yet are seen by
IPv6 as ordinary links that must support the minimum IPv6 link MTU of
1280 bytes ([RFC2460], section 5). Common tunneling mechanisms
(e.g., [RFC2529][RFC3056][RFC4213][RFC4214][RFC4380]) meet this
requirement through conservative static prearrangements at the
expense of degraded performance over some paths due to excessive IPv4
network-based fragmentation and/or missed opportunities to discover
larger MTUs. Optional dynamic MTU determination methods based on
ICMPv4 "fragmentation needed" messages are also available, but can
result in communication failures due to the unreliable and
untrustworthy nature of ICMPv4 messages generated by network
middleboxes.
This document proposes a link adaptation method for IPv6-in-IPv4
tunnels that presents an assured MTU to the IPv6 layer. It uses
tunnel endpoint-based segmentation/reassembly and dynamic segment
size probing with authenticated probe feedback. Thus, it provides
greater robustness and efficiency than existing schemes by avoiding
IPv4 network-based fragmentation and dependence on unreliable/
untrustworthy ICMPv4 feedback from IPv4 network middleboxes.
2. Requirements
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119].
3. Link Adaptation for IPv6-in-IPv4 Tunnels
The following subsections specify a link adaptation scheme for IPv6-
in-IPv4 tunnels with properties similar to those defined for AAL5
[RFC2684] and IEEE 802.11 [WLAN]:
3.1. Layering
IPv6-in-IPv4 tunnel endpoints that implement the link adaptation
specified in this document (hereafter referred to as
"implementations") operate at a logical midpoint between the IPv6 and
IPv4 protocol modules. From the viewpoint of IPv6, the
implementation appears as a network driver that delivers whole Upper
Layer Payloads (ULPs) to an underlying transmission media. From the
viewpoint of IPv4, the implementation appears as a packetization
layer protocol that segments ULPs to be encapsulated in IPv4 packets.
IPv6-in-IPv4 tunnel endpoints therefore operate at a logical "layer
Templin Expires September 4, 2006 [Page 2]
Internet-Draft Link Adaptation for Tunnels March 2006
2.5" between IPv6 as layer 3 and IPv4 as layer 2.
3.2. Tunnel MTU
Implementations MUST configure a minimum per-tunnel LinkMTU of 1280
bytes and SHOULD provide a configuration knob to set larger values.
A maximum per-tunnel LinkMTU of 9180 bytes (i.e., the same as defined
in [RFC1626]) is RECOMMENDED for normal use cases, since it is large
enough to accommodate Gigabit Ethernet Jumbo Frames yet not so large
as to diminish the effectiveness of 32-bit link layer CRCs [GIGE].
Implementations MAY set even larger LinkMTU values, but are advised
that this may lead to unacceptable levels of undetected errors unless
all physical segments in the path can provide assured error-free
delivery for large packets.
3.3. Encapsulation/Segmentation
Encapsulating tunnel endpoints cache per-flow segment sizes
("SEGSIZE") for the purpose of segmenting ULPs that are too large to
traverse the tunnel into chains of SEGSIZE-byte (or smaller)
segments. Conservative implementations can configure an initial
SEGSIZE of 68 bytes minus the length of the IPv4 header plus any
additional layer 2.5 encapsulation headers, since the minimum IPv4
LinkMTU is 68 bytes [RFC0791]. Under normal conditions, however,
implementations can configure initial SEGSIZE values up to 576 bytes
minus the IPv4 and layer 2.5 encapsulation header lengths since all
IPv4 nodes are required to configure a Maximum Receive Unit (MRU) of
at least 576 bytes [RFC0791][RFC1122][RFC1812]. (Also, most links in
the Internet configure still larger IPv4 LinkMTUs [RFC3150][RFC3819]
such that larger initial SEGSIZE values are often possible.)
Encapsulating tunnel endpoints split each ULP they send into a tunnel
into chains of segments for presentation to the IPv4 layer. The
segments MUST be contiguous and non-overlapping, i.e., the final byte
of the (i)th segment MUST be the byte that immediately precedes the
first byte of the (i+1)th segment. Non-final segments in the chain
MUST be equal in length; the final segment MAY be of different
length. For ULPs that span multiple segments, encapsulators use 2's
compliment Fletcher-32 [STONE][RFC3385] to calculate a checksum
across all payload bytes and encode the A and B results in a trailing
32-bit field as the final 4 bytes of the final packet(s) in the
chain. For ULPs that fit within a single segment, the trailing 32-
bit checksum is omitted.
Each segment in the chain is encapsulated in an IPv4 header plus any
additional layer 2.5 encapsulating headers, with the reserved bit in
the IPv4 "Flags" field set to '1' to inform the decapsulating tunnel
endpoint that the segmentation/reassembly scheme specified by this
Templin Expires September 4, 2006 [Page 3]
Internet-Draft Link Adaptation for Tunnels March 2006
document is used. In addition, each segment encodes the following
information in the 16-bit IPv4 "Identification" field:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ULPID | SEGID |P|A|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
IPv4 Identification Field
ULPID: 6 bits
An identifying value assigned by the sender to aid in reassembling
the segments of a ULP.
SEGID: 8 bits
A value that identifies a specific segment within a ULP.
Acceptable values are in the range 0 - 254.
P: 1 bit
Probe flag. 0 = Ordinary Segment, 1 = Probe Segment.
A: 1 bit
Additional Segments flag; 0 = Last Segment, 1 = Additional
Segments.
Each IPv4 packet in a chain encodes an identical value in the "ULPID"
field (bits 0 thru 5 of the IPv4 Identification field) to identify
the segments of a specific ULP; IPv4 packets that encapsulate
segments of different ULPs encode different ULPID values.
Consecutive IPv4 packets in a chain encode an increasing Segment ID
value between 0 - 254 in the "SEGID" field (bits 6 thru 13 of the
IPv4 Identification field), i.e., the first packet encodes the value
'0', the second packet encodes the value '1', etc. Each packet in
the chain except the final one sets the "Additional Segments - A" bit
(bit 15 of the IPv4 Identification field) to indicate that additional
segments follow. Finally, each packet in the chain is delivered to
the link layer (i.e., the IPv4 stack) in increasing SEGID order,
i.e., SEGID 0 first, followed by SEGID 1, etc., up to the final
packet. The link layer SHOULD NOT reorder the packets.
To increase efficiency and avoid excessive packet chain lengths,
implementations SHOULD seek to increase a flow's SEGSIZE to larger
values through path probing to avoid black holes [RFC2923].
Implementations probe a candidate SEGSIZE value 'N' by setting the
"Probe Segment - P" bit (bit 14 of the IPv4 Identification field) in
a probe segment of size N within a packet chain. After sending the
probe segment, if the encapsulator receives a unicast IPv6 Router
Templin Expires September 4, 2006 [Page 4]
Internet-Draft Link Adaptation for Tunnels March 2006
Advertisement message [RFC2461] from the decapsulator at the far end
of the tunnel (see: section 3.4) with an MTU option that encodes the
value N within a maximum probedelay ("MaxProbeDelay") timeout period
it deems the probe successful.
For probe segments that contain valid data for reassembly as part of
a packet chain, the encapsulator sets the appropriate SEGID value in
the IPv4 packet header as for ordinary segmentation. For probe
segments that are to be discarded by the decapsulator, the
encapsulator sets the value 255 in the SEGID field.
Following a successful probe, but before advancing SEGSIZE to N,
implementations SHOULD enter a brief verification phase during which
additional probe segments are sent to detect asymmetric multipath MTU
restrictions. Thereafter, implementations SHOULD re-probe
periodically to confirm that packets with up to SEGSIZE byte segments
are still reaching the decapsulator at the far end of the tunnel.
Additional strategies for SEGSIZE management and black hole detection
are found in [PMTUD][ICMPATK].
3.4. Decapsulation/Reassembly
For tunneled packets with the reserved bit in the IPv4 "Flags" field
set to '1' (see section 3.3), the IPv4 length, ULPID, SEGID and A
fields along with flow identification information in layer 2.5
encapsulation headers provide sufficient information for the
decapsulating tunnel endpoint to reassemble an original ULP with
protection for packet reordering in the IPv4 network.
Implementations of this scheme configure per-flow reassembly buffers
of at least 1280 bytes and SHOULD configure larger reassembly buffers
up to 9180 bytes or larger (see: section 3.2). Note that these
reassembly buffers occur at the logical layer 2.5 midpoint between
the IPv4 and IPv6 stacks and are thus distinct from the IPv4 and IPv6
reassembly caches.
Decapsulating tunnel endpoints use per-flow reassembly buffers to
concatenate the segments received in packet chains for a particular
ULPID in increasing SEGID order (i.e., SEGID 0, followed by SEGID 1,
etc.) even if the packets were re-ordered by the network. When all
segments for a particular ULPID have been concatenated into the
reassembly buffer, the implementation uses 2's complement Fletcher-32
to detect errors if a trailing checksum was included (see: section
3.3).
If the decapsulating tunnel endpoint receives a packet chain that
would overflow the reassembly buffer, it discards the chain and sends
an [ICMPv6] "packet too big" message back to the source. The message
body includes upper layer packet headers (IPv6 and above) and
Templin Expires September 4, 2006 [Page 5]
Internet-Draft Link Adaptation for Tunnels March 2006
contents of the reassembly buffer up to a total of 1280 bytes, and
the MTU value encodes the reassembly buffer size.
If the decapsulating tunnel endpoint receives at least one segment,
but one or more segments are lost and/or checksum verification fails,
it SHOULD send an [ICMPv6] "parameter problem" message with code
"reassembly/checksum error" back to the encapsulating tunnel
endpoint. The message body includes upper layer packet headers (IPv6
and above) and contents of the reassembly buffer up to a total of
1280 bytes, and the pointer identifies either the beginning of the
first missing segment or the beginning of the 4 byte checksum field
(if no segments were missing). Upon receipt of such [ICMPv6] errors,
the encapsulator SHOULD take appropriate corrective actions such as
reduce the tunnel's current SEGSIZE, impose an artificial inter-ULP
queuing delay for the tunnel, relay the [ICMPv6] messages back to the
original source as a congestion indication, etc.
If the decapsulating tunnel endpoint receives a segment used for
probing (i.e., an IPv4 packet in the chain with the 'P' flag set), it
sends a unicast IPv6 Router Advertisement message back to the
encapsulator at the originating end of the tunnel with an MTU option
that encodes the probe segment length (subject to rate-limiting as
for [ICMPv6] error messages). If the IPv4 packet containing the
probe segment encodes the value 255 in the SEGID field, the segment
is discarded; otherwise, the segment is included as part of the
normal reassembly procedure described above.
Following successful reassembly, the decapsulating tunnel endpoint
discards the trailing checksum (if present) and delivers the ULP to
upper layers.
3.5. Setting the DF Bit
When encapsulating tunnel endpoints segment ULPs (see: section 3.3),
they can optionally set or not set the "Don't Fragment - DF" bit in
the IPv4 headers of packets in a chain. If the DF bit is not set,
network-based IPv4 fragmentation may occur for packets in a chain
resulting in well-known performance issues [FRAG]. Additionally,
some middleboxes (such as IPv4 NATs and firewalls) are only capable
of passing the first fragment of a multi-fragment IPv4 datagram,
which could result in silent communication failures at decapsulating
tunnel endpoints. Finally, sending large IPv4 packets with the DF
bit not set could result in IPv4 reassembly buffer overruns at some
decapsulating tunnel endpoints and thereby also result in silent
communication failures.
While not setting the DF bit can lead to communication failures
observed as path MTU-related black holes, in some instances it might
Templin Expires September 4, 2006 [Page 6]
Internet-Draft Link Adaptation for Tunnels March 2006
result in successful communications when setting the DF bit would
otherwise have resulted in packet loss due to link MTU restrictions.
In view of these considerations, encapsulating tunnel endpoints are
advised to adopt a consistent strategy regarding setting of the DF
bit.
In any case, encapsulating tunnel endpoints SHOULD set the DF bit in
the IPv4 headers of packets used for probing.
3.6. ICMPv4 Error Handling
Encapsulators may receive ICMPv4 "fragmentation needed" error
messages from inside a tunnel due to probe failures and/or route
changes across previously-probed paths. These messages may come from
either legitimate IPv4 network middleboxes or adversarial/
mis-configured middleboxes that return wrong information.
Implementers are advised to consult [PMTUD][ICMPATK] for operational
recommendations on processing ICMPv4 "fragmentation needed" messages.
4. IANA Considerations
The IANA is instructed to assign a code type for "reassembly/checksum
error" under the [ICMPv6] Parameter Problem message type in the
"ICMPv6 Type Numbers" registry.
5. Security Considerations
The securing mechanisms for IPv6 neighbor discovery [RFC3971] and
Cryptographically-Generated Addresses [RFC3972] are used to
authenticate Router Advertisement probe responses.
6. Acknowledgments
This document represents the mindshare of many contributers.
7. Appendix A: Additional Considerations
Encapsulators can use the probing mechanism described in section 3 as
a general-purpose method for eliciting acknowledgements from the
reassembler if improved reliability at the expense of additional
overhead is desired.
The equal size restriction for non-final segments and non-overlapping
restriction for all segments in packet chains provides a significant
Templin Expires September 4, 2006 [Page 7]
Internet-Draft Link Adaptation for Tunnels March 2006
simplification for reassembly algorithms [RFC0815].
Use of the link adaptation scheme described in this document may lead
to an overall increase in short chains of small packets in the
Internet. Network administrators are advised to follow the
recommendations in [RFC3150] to minimize packet loss and packet
reordering. Also, overly-long packet chains should be avoided if
possible due to interactions with Active Queue Management (AQM) in
the network.
Since link-layer CRC-32 checks normally occur on each segment in the
path, most errors detected during ULP reassembly will be due to
packet splices and/or errors in the data path between the NIC
hardware and the reassembly buffer. The Fletcher-32 checksum
algorithm has been shown to provide an effective edge-to-edge error
detection capability for such errors [STONE]. The Fletcher-32
checksum is also dissimilar from both CRC-32 and the Internet
checksum used by many upper layer protocols, thereby decreasing the
likelihood of undetected errors.
Some upper layer packetization protocols (e.g., NFS) generate fixed
payload sizes and rely on the network layer to deliver the payloads
either as whole IP packets or as chains of IP fragments. Since NFS
performance (and the performance of other upper layer packetization
protocols) is highly sensitive to packet handling overhead,
implementations should periodically attempt to increase the SEGSIZE
through probing even if initial probe attempts fail.
8. Appendix B: Changes
Changes since -01:
o Updated references
Changes since -00:
o Defined new coding of segmentation/reassembly info in the IPv4
Identification field
o Changed "tunneling mechanism" to "tunnel endpoint"
o Clarified text on trailing checksums
o general document cleanup; removed "additional considerations" that
no longer apply
9. References
Templin Expires September 4, 2006 [Page 8]
Internet-Draft Link Adaptation for Tunnels March 2006
9.1. Normative References
[ICMPV6] Conta, A., Deering, S., and M. Gupta, ed., "Internet
Control Message Protocol (ICMPv6) for the Internet
Protocol Version 6 (IPv6) Specification",
draft-ietf-ipngwg-icmp-v3 (work in progress), July 2005.
[RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791,
September 1981.
[RFC1122] Braden, R., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989.
[RFC1812] Baker, F., "Requirements for IP Version 4 Routers",
RFC 1812, June 1995.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, December 1998.
[RFC2461] Narten, T., Nordmark, E., and W. Simpson, "Neighbor
Discovery for IP Version 6 (IPv6)", RFC 2461,
December 1998.
[RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure
Neighbor Discovery (SEND)", RFC 3971, March 2005.
[RFC3972] Aura, T., "Cryptographically Generated Addresses (CGA)",
RFC 3972, March 2005.
9.2. Informative References
[FRAG] Mogul, J. and C. Kent, "Fragmentation Considered Harmful,
In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
Communications Technology.", August 1987.
[GIGE] Dykstra, P., "Gigabit Ethernet Jumboframes (And Why You
Should Care), http://sd.wareonearth.com/~phil/jumbo.html",
December 1999.
[ICMPATK] Gont, F., "ICMP Attacks Against TCP",
draft-gont-tcpm-icmp-attacks (work in progress),
October 2005.
[PMTUD] Mathis, M. and J. Heffner, "Path MTU Discovery",
draft-ietf-pmtud-method (work in progress), October 2005.
Templin Expires September 4, 2006 [Page 9]
Internet-Draft Link Adaptation for Tunnels March 2006
[RFC0815] Clark, D., "IP datagram reassembly algorithms", RFC 815,
July 1982.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
November 1990.
[RFC1626] Atkinson, R., "Default IP MTU for use over ATM AAL5",
RFC 1626, May 1994.
[RFC2529] Carpenter, B. and C. Jung, "Transmission of IPv6 over IPv4
Domains without Explicit Tunnels", RFC 2529, March 1999.
[RFC2684] Grossman, D. and J. Heinanen, "Multiprotocol Encapsulation
over ATM Adaptation Layer 5", RFC 2684, September 1999.
[RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery",
RFC 2923, September 2000.
[RFC3056] Carpenter, B. and K. Moore, "Connection of IPv6 Domains
via IPv4 Clouds", RFC 3056, February 2001.
[RFC3150] Dawkins, S., Montenegro, G., Kojo, M., and V. Magret,
"End-to-end Performance Implications of Slow Links",
BCP 48, RFC 3150, July 2001.
[RFC3385] Sheinwald, D., Satran, J., Thaler, P., and V. Cavanna,
"Internet Protocol Small Computer System Interface (iSCSI)
Cyclic Redundancy Check (CRC)/Checksum Considerations",
RFC 3385, September 2002.
[RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
Wood, "Advice for Internet Subnetwork Designers", BCP 89,
RFC 3819, July 2004.
[RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
for IPv6 Hosts and Routers", RFC 4213, October 2005.
[RFC4214] Templin, F., Gleeson, T., Talwar, M., and D. Thaler,
"Intra-Site Automatic Tunnel Addressing Protocol
(ISATAP)", RFC 4214, October 2005.
[RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through
Network Address Translations (NATs)", RFC 4380,
February 2006.
[STONE] Stone, J., "Checksums in the Internet (Stanford Doctoral
Dissertation)", August 2001.
Templin Expires September 4, 2006 [Page 10]
Internet-Draft Link Adaptation for Tunnels March 2006
[WLAN] Society, I., "Part 11: Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) Specifications, IEEE
Computer Society, ANSI/IEEE 802.11, 1999 Edition.".
Templin Expires September 4, 2006 [Page 11]
Internet-Draft Link Adaptation for Tunnels March 2006
Author's Address
Fred L. Templin (editor)
Boeing Phantom Works
P.O. Box 3707
Seattle, WA 98124
USA
Email: fred.l.templin@boeing.com
Templin Expires September 4, 2006 [Page 12]
Internet-Draft Link Adaptation for Tunnels March 2006
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Templin Expires September 4, 2006 [Page 13]
| PAFTECH AB 2003-2026 | 2026-04-23 08:27:33 |