One document matched: draft-massar-v6man-mtu-label-02.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<?rfc rfcedstyle="yes"?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc comments="yes"?>
<?rfc inline="yes" ?>
<rfc category="std" ipr="trust200902" updates="6437" docName="draft-massar-v6man-mtu-label-02">
<front>
<title>The IPv6 MTU Label</title>
<author initials="J.R." surname="Massar" fullname="Jeroen Massar">
<organization>Massar Networking</organization>
<address>
<postal>
<street>Swiss Post Box 101811</street>
<street>Zuercherstrasse 161</street>
<city>Zürich</city>
<code>CH-8010</code>
<country>CH</country>
</postal>
<email>jeroen@massar.ch</email>
<uri>http://jeroen.massar.ch</uri>
</address>
</author>
<date month="November" year="2014"/>
<area>Internet</area>
<workgroup>IPv6 Maintenance</workgroup>
<keyword>IPv6</keyword>
<keyword>MTU</keyword>
<keyword>Header</keyword>
<keyword>PMTUD</keyword>
<keyword>ICMPv6</keyword>
<keyword>blackhole</keyword>
<keyword>latency</keyword>
<abstract>
<t>
This document redefines the use of the IPv6 Flow Label field
to allow specification of the lowest MTU on a path that the
packet travels, thus in most cases avoiding the need for
performing Path MTU Discovery and the round-trip penalty
that that occurs for processing the ICMPv6 PTB and
retransmitting the packets involved.
</t>
<t>
This specification allows graceful decrease of MTU so that
large and non-standard MTU sizes can safely be used on the Internet.
</t>
<t>
<cref anchor="A1" source="JM">
Obsoleting the IPv6 Flow Label and replacing
it completely with this field might be a better option
than keeping it defined as an IPv6 Flow Label field
and allowing existing flows to exist.
</cref>
<cref anchor="A2" source="JM">
A better name than "IPv6 MTU Label" is requested
as it is not really a "Label".
</cref>
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
When a packet that is being sent does not fit the MTU of the next link the IPv6 protocol
specifies that an ICMPv6 Packet Too Big (PTB) <xref target="RFC4443" /> error must be
sent back to the originator of the packet.
The original sending host then receives this packet and based on the MTU
provided resends that packet but then fitting the MTU indicated in the ICMPv6 PTB.
This process is called Path MTU Discovery <xref target="RFC1191" />.
</t>
<t>
Unfortunately there are broken networks that filter ICMPv6 altogether, even though
this is against the IPv6 specification. This breaks especially TCP <xref target="RFC0793" />
as those packets go missing altogether and then a timer has to fire till the packet
is sent again, but as that packet is also too large to fit the link it won't arrive
at the destination either and thus such a connection becomes stuck and times out.
</t>
<t>
In addition for various load-balancing implementations it is apparently a heavy task
to correlate incoming ICMPV6 PTBs to the original packets (which are for most part
included in the ICMPv6 PTB) and then deliver it to the backend node that
originally transmitted that packet so that that node can retransmit it again in
smaller chunks.
</t>
<t>
The extra round-trip of the ICMPv6 PTB and the need for having the load-balancer
needing to figure out where to forward the packet is a problem for large hosting
providers who want to minimize latency and maximize throughput <xref target="I-D.v6ops-pmtud-ecmp-problem" />.
</t>
<t>
This specificiation mitigates these problems, in part, by allowing routers to
include the lowest common MTU on the path in the IPv6 packet's Flow Label field.
</t>
<t>
Alternative techniques to solve this problem are described in:
Packetization Layer Path MTU Discovery <xref target="RFC4821" />.
This method requires extra packets, so called "probes", to be sent or for
there to be space left in the packet for including extra information
for transfering the MTU details. By using the former IPv6 Flow Label
we avoid these requirements.
</t>
</section> <!-- Introduction -->
<section title="Terminology">
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" in this document are to be interpreted as
described in <xref target="RFC2119" />.
</t>
<t>
Fields and numbers specified in this document are in network byte order (Big Endian).
</t>
<t>
This section lists a few terms specifically as they might easily be confused for each other.
<cref anchor="T1" source="JM">Clean these up and add more terms that might be confusing</cref>
</t>
<t>
<list style="hanging">
<t hangText="IPv6 Payload Length">
The maximum length of the payload (data) included in an IPv6 packet (excluding IPv6 header size).
</t>
<t hangText="Maximum Transmission Unit (MTU)">
The maximum length of a full packet (including IPv6 header size).
</t>
<t hangText="Ingress Interface">
Network Interface where a packet is received.
</t>
<t hangText="Egress Interface">
Network Interface where a packet is sent out.
</t>
<t hangText="Ingress MTU">
The MTU of the ingress interface.
</t>
<t hangText="Egress MTU">
The MTU of the egress interface.
</t>
<t hangText="Node">
A device that implements IP.
</t>
<t hangText="Path">
The set of links traversed by a packet between a source node and a destination node.
</t>
<t hangText="Path MTU, or PMTU">
The minimum link MTU of all the links in a path between a source node and a destination node.
</t>
<t hangText="PTB (Packet Too Big) message">
An ICMP message reporting that an IP packet is too large to forward.
</t>
<t hangText="MSS">
The TCP Maximum Segment Size <xref target="RFC6691" />, the maximum payload
size available to the TCP layer. This is typically the Path MTU
minus the size of the IP and TCP headers.
</t>
</list>
</t>
</section> <!-- Terminology -->
<section title="IPv6 MTU Label Format">
<t>
The IPv6 Flow Label field consists of 20 bits <xref target="RFC6437" />.
</t>
<figure>
<preamble>
The IPv6 Flow Label Field in the IPv6 Packet Header.
</preamble>
<artwork><![CDATA[
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
<t>
Hosts not supporting this specificiation will treat the IPv6 Flow Label as an opaque
number as defined in <xref target="RFC6437" />.
</t>
<t>
When the first four bits of the IPv6 Flow Label field are set to 1, the Flow Label Field
is considered to be a "IPv6 MTU Label" or for short "MTU Label".
</t>
<figure>
<preamble>
Format of the Flow Label field in "IPv6 MTU Label" mode.
</preamble>
<artwork><![CDATA[
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 1 1 1| Maximum Transmission Unit |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
<postamble>
The Flow Label ID is 0xf0000.
</postamble>
</figure>
<t>
This specification allows an MTU in the ranage of 1280 - 65534 in this header field.
</t>
<t>
When the MTU is less than 1280, it is considered invalid due to IPv6 minimum MTU requirement.
</t>
<t>
An MTU of 65535 indicates that IPv6 Jumbograms are in use.
Automatic MTU discovery does not work for these.
The MTU has to be configured properly on all nodes by the operator.
A future document might specify an Extension Header Option that
contains the JumboGram MTU size when the MTU is set to 65535.
</t>
<t>
Following are a few examples of common MTU Labels.
</t>
<figure>
<preamble>
MTU Label with an MTU of 1280 (0x0500)
</preamble>
<artwork><![CDATA[
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 1 1 1|0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
<postamble>
The Flow Label ID is 0xf0500.
</postamble>
</figure>
<figure>
<preamble>
MTU Label with an MTU of 1500 (0x05dc)
</preamble>
<artwork><![CDATA[
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 1 1 1|0 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
<postamble>
The Flow Label ID is 0xf05dc.
</postamble>
</figure>
<figure>
<preamble>
MTU Label with an MTU of 9000 (0x2328)
</preamble>
<artwork><![CDATA[
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 1 1 1|0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
<postamble>
The Flow Label ID is 0xf2328.
</postamble>
</figure>
</section> <!-- IPv6 MTU Label Format -->
<section title="Node Requirements">
<t>
Every node on the path, including source, intermediate routers and destination performs the following.
</t>
<t>
If no MTU Label is present (first 4 bits not '1'), and the label is 0, the node MUST
replace the label with MTU Label with a MTU value determined by the lower of the ingress and egress MTU
of the interfaces involved in forwarding the packet.
This allows backwards compatibility for nodes that do fill in the Flow Label field by not disturbing it.
<cref anchor="N1" source="JM">See A1 - when obsoleting the IPv6 Flow Label we force replacing the field always</cref>
</t>
<t>
Each node MUST verify that the given MTU Label is valid (>= 1280).
When an MTU Label in the range of 0 - 1279 is encountered it MUST be considered invalid and
overwritten by the router to the knowledge it has of the proper MTU.
<cref anchor="N2" source="JM">Or we could send an ICMPv6 Parameter problem, but this allows a form of backward compatibility when the first four bits are all set</cref>
</t>
<t>
Each hop, including the source, that is transmitting or forwarding the packet MUST update
the MTU Label to be the correct lowest MTU for that path as it has knowledge of.
This includes the MTU of the ingress and egress interface and learning about the MTU
from a different path between the same two points or by having another protocol
(e.g. TCP) providing these details.
MSS clamping <xref target="RFC6691" /> thus can affect the MTU Label if an implementation has this knowledge.
</t>
<t>
The first packet being sent in each direction of a path MUST have a maximum size of 1280,
while setting the MTU Label to the MTU of the egress interface.
</t>
<t>
A destination node MUST use the MTU found in the MTU Label for packets send subsequently
to the source, including the MTU in the MTU Label. This allows the source host to learn
the MTU of the full path.
</t>
<t>
Network equipment SHOULD have a configuration option to force overwriting
of non-MTU Label Flow Labels, this to force network equipment to handle the MTU Label.
This might cause issues when Flow Labels are actually used, hence is not the default.
</t>
</section>
<section title="Updating the MTU Label">
<t>
Given below is an example network along with a table of actions per node that illustrates
how a label is updated and how the knowledge learnt is used.
</t>
<figure>
<preamble>
A typical asymmetric network as found on the Internet
</preamble>
<artwork><![CDATA[
+------------+
| Host A |
+------------+
|
| MTU=1500
|
+-------------+
| Router 1 |
+-------------+
| ^
MTU=1500 | | MTU=1500
v |
+--------------+ +--------------+
| Router 2 | | Router 7 |
+--------------+ +--------------+
| ^
| |
| MTU=1480 | MTU=1280
| |
| +--------------+
| | Router 6 |
| +--------------+
| ^
| | MTU=1500
v |
+--------------+ +--------------+
| Router 3 | | Router 5 |
+--------------+ +--------------+
| ^
MTU=1500 | | MTU=1500
v |
+-------------+
| Router 4 |
+-------------+
|
| MTU=9000
|
+-------------+
| Host B |
+-------------+
]]></artwork>
</figure>
<t>
In this example network the routing protocols involved cause an asymmetric
routing of packets.
</t>
<t>
When Host A sends a packet to Host B, the path is: HA, R1, R2, R3, R4, HB.
The return path for a packet from Host B to Host A is: HB, R4, R5, R6, R7, R1, HA.
</t>
<t>
Given that network the following decisions are made.
</t>
<texttable anchor="table_what" title="How MTU gets updated">
<ttcol>Node</ttcol>
<ttcol>Decision</ttcol>
<ttcol>Incoming Link MTU</ttcol>
<ttcol>Outgoing Link MTU</ttcol>
<ttcol>MTU Label Change</ttcol>
<c>HA</c>
<c>No knowledge, thus use outgoing link-MTU</c>
<c>-</c>
<c>1500</c>
<c>1500</c>
<c>R1</c>
<c>Outbound not lower, don't update</c>
<c>1500</c>
<c>1500</c>
<c>"</c>
<c>R2</c>
<c>Outbound is lower, update (possible PTB)</c>
<c>1500</c>
<c>1480</c>
<c>1480</c>
<c>R3</c>
<c>Outbound is higher, don't update</c>
<c>1480</c>
<c>1500</c>
<c>"</c>
<c>R4</c>
<c>Outbound is higher, don't update</c>
<c>1500</c>
<c>9000</c>
<c>"</c>
<c>HB</c>
<c>Remember A to B = 1480</c>
<c>9000</c>
<c>-</c>
<c>"</c>
<!-- outbound/reply packet from B to A -->
<c></c>
<c>Reply packet:</c>
<c></c>
<c></c>
<c></c>
<c>HB</c>
<c>Learnt A-B = 1480, lower than link-MTU, use it</c>
<c>-</c>
<c>9000</c>
<c>1480</c>
<c>R4</c>
<c>Outbound is higher, don't update</c>
<c>9000</c>
<c>1500</c>
<c>"</c>
<c>R5</c>
<c>Outbound is higher, don't update</c>
<c>1500</c>
<c>1500</c>
<c>"</c>
<c>R6</c>
<c>Outbound MTU is lower, update it (possible PTB)</c>
<c>1500</c>
<c>1280</c>
<c>1280</c>
<c>R7</c>
<c>Outbound is higher, don't update</c>
<c>1280</c>
<c>1500</c>
<c>"</c>
<c>R1</c>
<c>Outbound is higher, don't update</c>
<c>1500</c>
<c>1500</c>
<c>"</c>
<c>HA</c>
<c>Learnt B-A is 1280</c>
<c>1500</c>
<c>1500</c>
<c>"</c>
</texttable>
<t>
In this situation, there would have been two possible locations where a PTB is sent (R2->R3 + R6->R7).
But as the first packet sent in a direction MUST be sized at a maximum of 1280, no PTB is possible.
After this first packet has been passed the real MTU is learned and this, possibly higher MTU can
be used.
</t>
<t>
This does demonstrate that even with this extra information, it might not always be perfect to avoid
PMTU blackholes. Nor does the MTU Label avoid the need to handle PTB or retransmitting packets in a smaller way.
Hence, sending, receiving, forwarding and handling ICMPv6 remains important and MUST not be filtered.
</t>
<t>
Note that in the above list does not mention any of the standard functions and checks like updating
the Hop Limit that a router is supposed to do as per the IPv6 protocol.
</t>
</section>
<section title="Maximum size of packets">
<t>
To facilitate learning the MTU on the complete path at minimum 3 packets need
to be sent between the same source and destination host. With the 3rd and
subsequent packet will have the correct info
</t>
<texttable anchor="table_packets" title="Maximum Packet Size">
<ttcol>Step</ttcol>
<ttcol>Maximum Packet Size</ttcol>
<ttcol>Description</ttcol>
<ttcol>Destination Learns</ttcol>
<c>1</c>
<c>1280</c>
<c>Packet sent from A to B</c>
<c>B learns MTU on path A-B</c>
<c>2</c>
<c>1280</c>
<c>Packet sent from B to A</c>
<c>A learns path A-B-A</c>
<c>3+</c>
<c>MTU max</c>
<c>Packet sent from A to B</c>
<c>B learns MTU for full round-trip path A-B-A-B</c>
</texttable>
<t>
The 3rd and further packets have knowledge of the MTU of the full round-trip path
and thus can use this information to send larger packets.
</t>
<t>
This example assumes sending a single packet after each other in each direction.
In the situation where a host sends multiple packets it should use the same
step as the previous one till it receives a return packet.
</t>
<t>
Note that thus a TCP handshake (3 packets) is enough to learn the correct MTU based
on the MTU Label.
In that case a host might decide to also use the TCP MSS as additional information.
</t>
</section>
<section title="Handling network changes">
<t>
As the process of MTU Labeling happens per packet, new information will become available to
the host continuously. When a sending host receives information that an MTU is lower than a
packet it recently sent it could decide to resend that packet directly to avoid it from
being blackholed in the upstream.
</t>
</section>
<section title="MTU Rediscovery">
<t>
A host can decide that with a long-standing connection a re-probe of the MTU is needed.
It can do so by ignoring the cached information and sending a packet with a maximum
size of 1280 while setting the MTU Label to the largest its egress link supports.
This is similar to the situation a first packet would be in and thus restarts the
process at the highest possible MTU.
</t>
</section>
<section title="Security Considerations">
<section title="Spoofing the MTU Label">
<t>
An adversary might spoof IP packets from a source to a destination with a on-purpose misconfigured MTU Label.
An adversary might also perform a man-in-the-middle misconfiguring the MTU Label.
</t>
<t>
The effect of doing so though will be minimal as any intermediary router will correct the MTU to the
value they know is correct based on the interfaces the packet flows over.
</t>
<t>
The only negative outcome can be that the packet size is reduced to the minimum of 1280.
The result being a minor performance impact on that path till MTU Rediscovery happens and the MTU
is scaled upward again.
</t>
<t>
Of course networks should employ Network Best Practices and employ Anti-spoofing techniques to make
this kind of attack impossible.
</t>
</section> <!-- Spoofing the MTU Label-->
<section title="Firewall treatment of the MTU Label">
<t>
A firewall might consider the MTU Label untrusted.
</t>
<t>
As the firewall knows that the correct value for the MTU Label
is between 1280, the IPv6 minimum MTU, and it's own link MTU it can act like
every node that supports the MTU Label and limit the MTU in that range.
See Node Requirements for further details.
</t>
<t>
A strict firewall where the operator does not want to take any risk
could even force a MTU of 1280 but causing performance loss.
</t>
</section> <!-- Firewalls forcing the MTU Label-->
</section> <!-- Security Considerations -->
<section title="Acknowledgements">
<t>
Thanks must go to Lorenzi Colliti for bringing <xref target="I-D.v6ops-pmtud-ecmp-problem" /> to
the attention of the author.
</t>
<t>
Many thanks to Brian Carpenter for many insightful comments that clarified this specification a lot.
</t>
<t>
Matyas Koszik mentioned that the first packet on each link should be 1280 for the MTU discovery
to work in both directions which resulted in the "Maximum size of packets" section.
</t>
</section>
</middle>
<back>
<references>
<?rfc include="reference.RFC.0793.xml"?>
<?rfc include="reference.RFC.1191.xml"?>
<?rfc include="reference.RFC.2119.xml"?>
<?rfc include="reference.RFC.4443.xml"?>
<?rfc include="reference.RFC.4821.xml"?>
<?rfc include="reference.RFC.6437.xml"?>
<?rfc include="reference.RFC.6691.xml"?>
<?rfc include="reference.I-D.v6ops-pmtud-ecmp-problem.xml"?>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 01:51:15 |