One document matched: draft-templin-linkadapt-00.txt
Network Working Group F. Templin, Ed.
Internet-Draft Boeing Phantom Works
Expires: December 19, 2005 June 17, 2005
Link Adaptation for IPv6-in-IPv4 Tunnels
draft-templin-linkadapt-00.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 19, 2005.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
IPv6-in-IPv4 tunneling mechanisms support the minimum IPv6 MTU of
1280 bytes via static prearrangements at the tunnel encapsulator
and/or dynamic MTU determination based on ICMPv4 messages, but these
methods have known operational limitations. This document proposes a
new MTU determination mechanism for IPv6-in-IPv4 tunnels that uses a
link adaptation scheme with simplified IPv4 segmentation/reassembly
and dynamic segment size probing.
Templin Expires December 19, 2005 [Page 1]
Internet-Draft Link Adaptation for Tunnels June 2005
1. Introduction
IPv6-in-IPv4 tunnels span multiple IPv4 network hops yet are seen by
IPv6 as ordinary links that must support the minimum IPv6 link MTU of
1280 bytes ([RFC2460], section 5). Common tunneling mechanisms
(e.g., [RFC2529][RFC3056][ISATAP][MECH][TEREDO]) meet this
requirement through conservative static prearrangements at the
encapsulator at the expense of sub-optimal performance over some
paths due to excessive IPv4 network-based fragmentation and/or missed
opportunities to discover larger MTUs. Optional dynamic MTU
determination methods based on ICMPv4 "fragmentation needed" messages
are also available, but can result in MTU-related communication
failures due to the unreliable and untrustworthy nature of ICMPv4
messages generated by network middleboxes.
This document proposes a link adaptation method for IPv6-in-IPv4
tunnels that presents an assured MTU to the IPv6 layer. It uses
simplified segmentation/reassembly and dynamic segment size probing
with authenticated probe feedback. Thus, it provides greater
robustness and efficiency than existing schemes by avoiding IPv4
network-based fragmentation and reducing dependence on unreliable/
untrustworthy ICMPv4 feedback from IPv4 network middleboxes.
2. Requirements
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in [RFC2119].
3. Link Adaptation for IPv6-in-IPv4 Tunnels
The following subsections specify a link adaptation scheme for IPv6-
in-IPv4 tunnels with properties similar to those defined for AAL5
[RFC2684] and IEEE 802.11 [WLAN]:
3.1. Layering
IPv6-in-IPv4 tunneling mechanisms that implement the link adaptation
specified in this document (hereafter referred to as
"implementations") operate at a logical midpoint between the IPv6 and
IPv4 protocol modules. From the viewpoint of IPv6, the
implementation appears as a network driver that delivers whole Upper
Layer Payloads (ULPs) to an underlying transmission media. From the
viewpoint of IPv4, the implementation appears as a packetization
layer protocol (e.g., similar to TCP, etc.) that segments user data
to be encapsulated in IPv4 packets.
Templin Expires December 19, 2005 [Page 2]
Internet-Draft Link Adaptation for Tunnels June 2005
3.2. Tunnel Interface MTU
Implementations MUST configure a minimum per-tunnel interface LinkMTU
of 1280 bytes and SHOULD provide a configuration knob to set larger
values. A maximum LinkMTU of 9180 bytes (i.e., the same as defined
in [RFC1626]) is RECOMMENDED for normal use cases, since it is large
enough to encode 8KB network filesystem blocks and take advantage of
Gigabit Ethernet Jumbo Frames, yet not so large as to diminsh the
effectiveness of 32-bit link layer CRCs [GIGE]. Implementations MAY
set even larger LinkMTU values, but are advised that this may lead to
unacceptable levels of undetected errors unless all physical segments
in the path can provide assured error-free deliverey for large
packets.
Since LinkMTU values larger than 1280 bytes may result in [ICMPv6]
"packet too big" messages due to temporary segmentation restrictions
(see: section 3.3), ULPs SHOULD employ a probing strategy that begins
with a smaller payload size (on the order of 1KB) and probes upward
[PMTUD]. (Note that this may not be possible for some ULPs.)
3.3. Encapsulation/Segmentation
Encapsulators cache per-flow segment sizes ("SEGSIZE") for the
purpose of segmenting ULPs into chains of IPv4 datagrams.
Conservative implementations can configure an initial SEGSIZE of 68
bytes minus the length of the IPv4 header and any additional
encapsulation headers, since the minimum IPv4 LinkMTU is 68 bytes
[RFC0791]. In practice, however, most Internet links configure much
larger IPv4 LinkMTUs [RFC3150][RFC3819] such that larger initial
SEGSIZE values are often possible.
The encapsulator splits each ULP into a chain of at most 32 segments
for presentation to the IPv4 layer. The segments MUST be contiguous
and non-overlapping, i.e., the final byte of the (i)th segment MUST
be the byte that immediately precedes the first byte of the (i+1)th
segment. Non-final segments in the chain MUST be equal in size; the
final segment MAY be of different size. For ULPs that span multiple
segments, encapsulators use 2's compliment Fletcher-32
[STONE][RFC3385] to calculate a checksum across all ULP payload bytes
and record the A and B results in a trailing 32-bit checksum. For
ULPs that fit within a single segment, the trailing 32-bit checksum
is omitted.
Segments are encapsulated in-order in consecutive IPv4 packets with
bit 1 of the "Flags" field (i.e., "Don't Fragment - DF") set to '1'
and an increasing Segment ID ("SEGID") value between 0 - 31 encoded
in the five low-order bits in the "Fragmentation Offset" field, i.e.,
the first packet encodes '0', the second packet encodes '1', etc.
Templin Expires December 19, 2005 [Page 3]
Internet-Draft Link Adaptation for Tunnels June 2005
Each packet in the chain except the final one sets the "More
Fragments - MF" bit, i.e., the MF bit is set as for ordinary IPv4
fragmentation. Each packet in the chain is delivered to the link
layer (i.e., the IPv4 stack) in increasing SEGID order, i.e., SEGID 0
first, followed by SEGID 1, etc., up to the final packet; the link
layer SHOULD NOT reorder the packets or introduce artificial delays
between packets.
Implementations MAY increase a flow's SEGSIZE to larger values
through path probing to avoid black holes [RFC2923]. Implementations
probe a candidate SEGSIZE value 'N' by segmenting a ULP into a chain
of two or more packets such that the final packet encapsulates a
segment of size N, where N is larger than the size of the segments
encapsulated in non-final packets. The chain SHOULD also include
Forward Error Correction (FEC) information (format and encoding TBD)
that covers the probe segment in case of loss. If the encapsulator
receives a unicast IPv6 Router Advertisement message [RFC2461] from
the decapsulator at the far end of the tunnel (see: section 3.4) with
an MTU option that encodes the value N within a maximum probedelay
("MaxProbeDelay") timeout period, it deems the probe successful.
Following a successful probe, but before advancing SEGSIZE to N,
implementations SHOULD enter a brief verification phase during which
additional probes are sent to detect asymmetric multipath MTU
restrictions. Thereafter, implementations SHOULD re-probe
periodically to confirm that packets with up to SEGSIZE byte segments
are still reaching the decapsulator at the far end of the tunnel.
Additional strategies for SEGSIZE management and black hole detection
are found in [PMTUD].
3.4. Decapsulation/Reassembly
The Length, SEGID, MF and flow identification information in the
encapsulation headers of packets in a chain provide sufficient
information for the tunnel decapsulator to reassemble the original
ULP with protection for packet reordering in the IPv4 network.
Decapsulators MUST configure per-flow reassembly buffers of at least
1280 bytes and SHOULD configure larger per-flow reassembly buffers up
to 9180 bytes or larger (see: section 3.2).
Decapsulators use per-flow reassembly buffers to concatenate the ULP
segments received in packet chains in increasing SEGID order (i.e.,
SEGID 0, followed by SEGID 1, etc.) even if the packets were re-
ordered by the network. When all ULP segments have been concatenated
into the reassembly buffer, the decapsulator uses 2's complement
Fletcher-32 to detect errors if a trailing checksum was included
(see: section 3.3).
Templin Expires December 19, 2005 [Page 4]
Internet-Draft Link Adaptation for Tunnels June 2005
If the decapsulator receives a packet chain that would overflow the
reassembly buffer, it discards the chain and sends an [ICMPv6]
"packet too big" message back to the source. The message body
includes upper layer packet headers (IPv6 and above) and contents of
the reassembly buffer up to a total of 1280 bytes, while the MTU
value encodes the reassembly buffer size.
If at least one segment was received, but one or more segments were
lost and/or checksum verification failed, the decapsulator SHOULD
send an [ICMPv6] "parameter problem" message with code "reassembly/
checksum error" back to the encapsulator at the originating end of
the tunnel. The message body includes upper layer packet headers
(IPv6 and above) and contents of the reassembly buffer up to a total
of 1280 bytes, and the pointer identifies either the beginning of the
first missing segment or the beginning of the 4 byte checksum field
(if no segments were missing). Upon receipt of such [ICMPv6] errors,
the encapsulator SHOULD take appropriate corrective actions such as
reduce the tunnel's current SEGSIZE, impose an artifical inter-ULP
queuing delay for the tunnel, relay the [ICMPv6] messages back to the
original source as a congestion indication, etc.
When a decapsulator receives a packet chain used for probing (see:
section 3.3), it reassembles the ULP as above and sends a unicast
IPv6 Router Advertisement message back to the encapsulator at the
originating end of the tunnel with an MTU option that encodes the
size of the segment encapsulated in the final packet in the chain.
The encapsulator will receive the Router Advertisement and deem the
probe successful.
Following successful reassembly, the trailing checksum is discarded
(if present) and the ULP payload is delivered to upper layers.
3.5. ICMPv4 Error Handling
Encapsulators may receive ICMPv4 "fragmentation needed" error
messages from inside a tunnel due to probe failures and/or route
changes across previously-probed paths. These messages may come from
either legitimate IPv4 network middleboxes or adversarial/
mis-configured middleboxes that return wrong information.
Implementers are advised to consult [PMTUD] for operational
recommendations on processing ICMPv4 "fragmentation needed" messages.
4. IANA Considerations
The IANA is instructed to assign a code type for "reassembly/checksum
error" under the [ICMPv6] Parameter Problem message type in the
"ICMPv6 Type Numbers" registry.
Templin Expires December 19, 2005 [Page 5]
Internet-Draft Link Adaptation for Tunnels June 2005
5. Security Considerations
The securing mechanisms for IPv6 neighbor discovery [RFC3971] and
Cryptographically-Generated Addresses [RFC3972] are used to
authenticate Router Advertisement probe responses.
6. Acknowledgments
This document represents the mindshare of many contributers.
7. Appendix A: Additional Considerations
Encapsulators can segment chains of two or more packets in which the
final packet is longer than the non-final packets as a general-
purpose mechanism for eliciting acknowledgements from the reassembler
if improved reliability at the expense of additional overhead is
desired. The equal size restriction for non-final segments and non-
overlapping restriction for all segments in packet chains provides a
significant simplification for reassembly algorithms [RFC0815].
Use of the link adaptation scheme described in this document may lead
to an overall increase in short chains of small packets in the
Internet. Network administrators are advised to follow the
recommendations in [RFC3150] to minimize packet loss and packet
reordering.
Network middleboxes that do not honor the IPv4 DF bit will cause
irreparable damage to the information encoded in the IPv4 headers of
encapsulated packets if fragmentation is incurred.
Network conditions such as load balancing, multi-path routing,
spanning tree reconfigurations, etc. can cause a certain degree of
reordering of the packets in a flow. For instance, Segment 5 of a
segmented PDU could arrive before Segment 1. The 5-bit segment ID in
each packet provides protection for reordering among the packets of
the same PDU, but provides no protection for reordering of packets
belonging to *different* PDUs. A small ID field is therefore needed
in each packet to differentiate the packets of PDUs A and B. The
question arises as to whether a very small (2-4 bit) ID field is
enough to eliminate potential ambiguity due to packet reordering in
the network. Several works conducted by CAIDA (www.caida.org) may
provide insights.
Since link-layer CRC-32 checks normally occur on each segment in the
path, most errors detected during PDU reassembly will be due to
packet splices and/or errors in the data path between the NIC
Templin Expires December 19, 2005 [Page 6]
Internet-Draft Link Adaptation for Tunnels June 2005
hardware and the reassembly buffer. The Fletcher-32 checksum
algorithm has been shown to provide an effective edge-to-edge error
detection capability for such errors [STONE]. The Fletcher-32
checksum is also dissimilar from both CRC-32 and the Internet
checksum used by many upper layer protocols, thereby decreasing the
likelihood of undetected errors.
Prior to any path MTU probing for a flow, link adaptation should
begin with a conservative initial SEGSIZE to yield an IPv4 packet
size of 68 bytes (the maximum IPv4 packet size guaranteed to fit over
any link in the IPv4 Internet without incurring fragmentation) so
that an un-probed ULP payload of at least 1280 bytes will be assured
for ultra-conservative implementations. But, [RFC3150] suggests a
minimum MTU of 296 bytes over the slowest serial links, so a slightly
more optimistic implementation could send ULP payloads as large as
((296 - encapsulation_header_length) * 32) ~= 9000 bytes (and perhaps
a bit larger due to VJ header compression) as long as they arrange
for the first few such payloads to generate probe responses from the
far-end. For those optimistic implementations, if probe responses
consistently arrive after an initial probe and subsequent
verification phase, the flow's SEGSIZE can be advanced to the size
used for probing. Otherwise, the interface can generate IPv6 "packet
too big" messages to inform upper packetization layers that smaller
IPv6 packets should be sent over this flow for the time being. An
optimistic implementation could therefore set the maximum interface
LinkMTU of 9180 bytes and perform the optimistic initial probing
described above.
Some upper layer packetization protocols (e.g., NFS) generate fixed
payload sizes and rely on the network layer to deliver the payloads
either as whole IP packets or as chains of IP fragments. Those
protocols should consider "packet too big" messages coming from the
interface as an indication to retransmit, since the IP fragmentation
layer will have been informed of the smaller MTU for the flow.
Subsequent payloads sent over the flow will therefore undergo IP
fragmentation and each fragment will be presented to the interface
for transmission. Since NFS performance (and the performance of
other upper layer packetization protocols) is highly sensitive to
packet handling overhead, implementations should periodically attempt
to increase the SEGSIZE through probing even if initial probe
attempts fail.
Since the RTT paths along various paths may vary from the sub-
microsecond level up to hundreds of milliseconds or more, Forward
Error Correction (FEC) will clearly be required in some cases (i.e.,
instead of Automatic Repeat Request (ARQ)) even though efficiency may
suffer [RFC3819]. Provisions for enabling adaptive and efficient FEC
in the segmentation/reassembly procedures are FFS.
Templin Expires December 19, 2005 [Page 7]
Internet-Draft Link Adaptation for Tunnels June 2005
8. References
8.1. Normative References
[ICMPV6] Conta, A., Deering, S., and M. Gupta, ed., "Internet
Control Message Protocol (ICMPv6) for the Internet
Protocol Version 6 (IPv6) Specification",
draft-ietf-ipngwg-icmp-v3 (work in progress),
November 2004.
[RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791,
September 1981.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, December 1998.
[RFC2461] Narten, T., Nordmark, E., and W. Simpson, "Neighbor
Discovery for IP Version 6 (IPv6)", RFC 2461,
December 1998.
[RFC3971] Arkko, J., Kempf, J., Zill, B., and P. Nikander, "SEcure
Neighbor Discovery (SEND)", RFC 3971, March 2005.
[RFC3972] Aura, T., "Cryptographically Generated Addresses (CGA)",
RFC 3972, March 2005.
8.2. Informative References
[FRAG] Mogul, J. and C. Kent, "Fragmentation Considered Harmful,
In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
Communications Technology.", August 1987.
[GIGE] Dykstra, P., "Gigabit Ethernet Jumboframes (And Why You
Should Care), http://sd.wareonearth.com/~phil/jumbo.html",
December 1999.
[ISATAP] Templin, F., Gleeson, T., Talwar, M., and D. Thaler,
"Intra-Site Automatic Tunnel Addressing Protocol
(ISATAP)", draft-ietf-ngtrans-isatap (work in progress),
January 2005.
[MECH] Nordmark, E. and R. Gilligan, "Transition Mechanisms for
IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2 (work in
progress), March 2005.
Templin Expires December 19, 2005 [Page 8]
Internet-Draft Link Adaptation for Tunnels June 2005
[PMTUD] Mathis, M., Heffner, J., and K. Lahey, "Path MTU
Discovery", draft-ietf-pmtud-method (work in progress),
February 2005.
[RFC0815] Clark, D., "IP datagram reassembly algorithms", RFC 815,
July 1982.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
November 1990.
[RFC1626] Atkinson, R., "Default IP MTU for use over ATM AAL5",
RFC 1626, May 1994.
[RFC2529] Carpenter, B. and C. Jung, "Transmission of IPv6 over IPv4
Domains without Explicit Tunnels", RFC 2529, March 1999.
[RFC2684] Grossman, D. and J. Heinanen, "Multiprotocol Encapsulation
over ATM Adaptation Layer 5", RFC 2684, September 1999.
[RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery",
RFC 2923, September 2000.
[RFC3056] Carpenter, B. and K. Moore, "Connection of IPv6 Domains
via IPv4 Clouds", RFC 3056, February 2001.
[RFC3150] Dawkins, S., Montenegro, G., Kojo, M., and V. Magret,
"End-to-end Performance Implications of Slow Links",
BCP 48, RFC 3150, July 2001.
[RFC3385] Sheinwald, D., Satran, J., Thaler, P., and V. Cavanna,
"Internet Protocol Small Computer System Interface (iSCSI)
Cyclic Redundancy Check (CRC)/Checksum Considerations",
RFC 3385, September 2002.
[RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
Wood, "Advice for Internet Subnetwork Designers", BCP 89,
RFC 3819, July 2004.
[STONE] Stone, J., "Checksums in the Internet (Stanford Doctoral
Dissertation)", August 2001.
[TEREDO] Huitema, C., "Teredo: Tunneling IPv6 over UDP through
NATs", draft-huitema-v6ops-teredo (work in progress),
April 2005.
[WLAN] Society, I., "Part 11: Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) Specifications, IEEE
Templin Expires December 19, 2005 [Page 9]
Internet-Draft Link Adaptation for Tunnels June 2005
Computer Society, ANSI/IEEE 802.11, 1999 Edition.".
Templin Expires December 19, 2005 [Page 10]
Internet-Draft Link Adaptation for Tunnels June 2005
Author's Address
Fred Lambert Templin (editor)
Boeing Phantom Works
P.O. Box 3707
Seattle, WA 98124
USA
Email: fred.l.templin@boeing.com
Templin Expires December 19, 2005 [Page 11]
Internet-Draft Link Adaptation for Tunnels June 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Templin Expires December 19, 2005 [Page 12]
| PAFTECH AB 2003-2026 | 2026-04-24 09:50:25 |