One document matched: draft-naderi-ipv6-probing-01.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!-- This is built from a template for a generic Internet Draft. Suggestions for
improvement welcome - write to Brian Carpenter, brian.e.carpenter @ gmail.com -->
<!-- This can be converted using the Web service at http://xml.resource.org/experimental.html
(which supports the latest, sometimes undocumented and under-tested, features.) -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- You need one entry like the following for each RFC referenced --><!ENTITY RFC2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC2460 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2460.xml">
<!ENTITY RFC5533 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5533.xml">
<!ENTITY RFC5534 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5534.xml">
<!ENTITY RFC7045 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7045.xml">
<!ENTITY RFC6555 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6555.xml">
<!ENTITY RFC4960 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4960.xml">
<!ENTITY RFC6824 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6824.xml">
<!ENTITY RFC2827 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2827.xml">
<!ENTITY RFC3704 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3704.xml">
<!ENTITY RFC4340 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4340.xml">
<!-- You need one entry like the following for each I-D referenced --><!ENTITY DRAFT-homenet SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-homenet-arch.xml">
]>
<?rfc toc="yes"?>
<!-- You want a table of contents -->
<?rfc symrefs="yes"?>
<!-- Use symbolic labels for references -->
<?rfc sortrefs="yes"?>
<!-- This sorts the references -->
<?rfc iprnotified="no" ?>
<!-- Change to "yes" if someone has disclosed IPR for the draft -->
<?rfc compact="yes"?>
<!-- This defines the specific filename and version number of your draft (and inserts the appropriate IETF boilerplate -->
<rfc category="info" docName="draft-naderi-ipv6-probing-01" ipr="trust200902">
<front>
<title abbrev="IPv6 Probing">Experience with IPv6 path probing</title>
<author fullname="Habib Naderi" initials="H." surname="Naderi">
<organization abbrev="Univ. of Auckland"/>
<address>
<postal>
<street>Department of Computer Science</street>
<street>University of Auckland</street>
<street>PB 92019</street>
<city>Auckland</city>
<region/>
<code>1142</code>
<country>New Zealand</country>
</postal>
<email>habib@cs.auckland.ac.nz</email>
</address>
</author>
<author fullname="Brian Carpenter" initials="B. E." role="editor" surname="Carpenter">
<organization abbrev="Univ. of Auckland"/>
<address>
<postal>
<street>Department of Computer Science</street>
<street>University of Auckland</street>
<street>PB 92019</street>
<city>Auckland</city>
<region/>
<code>1142</code>
<country>New Zealand</country>
</postal>
<email>brian.e.carpenter@gmail.com</email>
</address>
</author>
<date day="22" month="April" year="2015"/>
<area> </area>
<workgroup> </workgroup>
<abstract>
<t>This document reports on experience and simulations of dynamic probing
of alternate paths between two IPv6 hosts when network failures occur.
Two models for such probing were investigated: the SHIM6 REAchability
Protocol (REAP) and the Multipath Transmission Control Protocol (MPTCP).
The motivation for this document is to identify some aspects of path
probing at large or very large scale that may be broadly relevant to
future protocol design.
</t>
</abstract>
</front>
<middle>
<section anchor="intro" title="Introduction">
<t>A common situation in the Internet today is that a host trying to contact another
host has a choice of IP addresses for one or both ends of the communication. Multiple
addresses are expected to be quite common for IPv6 hosts <xref target="RFC2460"/>. Some
approaches to this situation envisage either switching paths during the course of the
communication or using multiple paths in parallel. Examples include "Happy Eyeballs"
<xref target="RFC6555"/> which tries alternative paths at the start,
SHIM6 <xref target="RFC5533"/> and Stream Control Transmission Protocol (SCTP)
<xref target="RFC4960"/> which change paths when there is a failure,
and Multipath TCP (MPTCP) <xref target="RFC6824"/>
which shares the paths dynamically. </t>
<t>Some of these methods involve active path probing to choose the best one. SHIM6
probes all available paths using the REAchability Protocol
(REAP) <xref target="RFC5534"/> when the current path
fails, and MPTCP effectively probes all paths continuously, and shifts load according
to the results. In this document we summarise results and observations from
SHIM6 and MPTCP operated or simulated at large scale. These observations may be
of help in designing future path probing mechanisms. In particular, we are
interested in minimising both the time taken to recover to the maximum possible
throughput after a path failure, and the amount of overhead traffic caused by
the probing process. </t>
<t>In summary, we ran a series of SHIM6 experiments, each including 250 path failures,
between Auckland and Dublin, measuring the time and overhead traffic for each instance of
path probing and recovery. Then we repeated essentially the same experiment in
the laboratory in Auckland (i.e., with negligible RTT instead of round-the-world
RTT). Then we built a Stochastic Activity Network (SAN) simulation model of the
same scenarios, and validated it by comparison with the experimental results.
Finally we used this model to simulate path failure and recovery using REAP at very
large scale (10,000 simultaneous sessions on a single site experiencing path failure).
Both TCP and DCCP <xref target="RFC4340"/> were used for the transport layer,
with a simple application sending meaningless data in one direction only. </t>
<t>This was followed by roughly equivalent simulations of recovery from path
failure for MPTCP sessions. In this case we validated the SAN model by
comparison with a completely different MPTCP simulator developed elsewhere
<xref target="Wischik10"/>. </t>
<t>One advantage of the SAN model is that there are SAN analysis software
tools which allow very large scale simulations.
Another is that it makes it relatively easy to experiment with variations
of the protocol itself, so we did test the impact of certain protocol changes.
However, unlike conventional network simulation tools, the user has to
program a complete protocol behaviour model. We used the Moebius tool
<xref target="Moebius"/>. </t>
<t>Details of the experiments and results have been described in two papers
<xref target="Naderi10"/> <xref target="Naderi14b"/> and in H. Naderi's thesis
<xref target="Naderi14a"/>. This document limits itself to outlining the
results and their implications for the design of path probing mechanisms in the
Internet. </t>
</section> <!-- intro -->
<section anchor="shim6res" title="Results for SHIM6 and REAP">
<section anchor="intexpt" title="Experiments over the Internet">
<t>We set up a test environment which enabled us to run a set of experiments
over the Internet with the LinShim6 implementation of SHIM6
<xref target="Barre08"/>. We have used two SHIM6-enabled multi-
addressed hosts, located in the University of Auckland (New Zealand) and
Waterford Institute of Technology (Dublin, Ireland). Each host was equipped
with two network interface cards and configured with two prefixes from two
different providers. The SHIM6 host in Auckland was connected to a router
which was a Linux machine and was configured as an IPv6 router. This router
simulated link failures for the experiments. </t>
<t>Source Address Dependent Routing (SADR) is necessary for effective use of
SHIM6. Hosts decide what source and destination address to use when host-centric
solutions, like SHIM6, are used. Without SADR, or similar mechanism for routing,
packets might be forwarded to the wrong address providers and dropped because of
ingress filtering according to BCP 38 <xref target="RFC2827"/> <xref target="RFC3704"/>.
Unfortunately, we could not convince the
university network administrators to enable SADR on the Auckland University edge router.
To run the experiments, they agreed to add static routes to the edge router's
routing table, to forward packets destined to the host in Dublin through different
providers according to their destination addresses. Therefore, only two address pairs
out of four possible address pairs could work. To resolve this issue, we have changed
LinShim6 to shuffle the list of address pairs before starting the exploration process
in order to put the working address pair in a random location in the list. As a result,
the working address pair could appear in any location in the list and thus create different
recovery cases. </t>
<t>This configuration enabled us to run experiments with four address pairs over the
Internet. For each experiment, we artificially created 250 failures and for each case
measured the REAP exploration time (EP), number of sent (SP) and received probes (RP)
and application recovery time (ART).</t>
<t>Comparing results from experiments with TCP and DCCP shows that when DCCP is employed,
EP, SP and RP are bigger than when TCP is used. The main reason for this is that DCCP
employs delayed acknowledgement. It sends ACKs every RTT (300 ms), while in case of TCP,
they are sent more frequently (less than 100 ms apart). Since the RTT is long, the communications look
different from REAP's view point although the behaviour of the application is the same in
both experiments. Since TCP sends ACKs faster, REAP treats it more like a bi-directional
communication while DCCP communication is treated more like uni-directional. As a result,
in the DCCP experiment, the sender always detects the failure first and then reports it
to the receiver, while in the TCP experiment both sides detect failure and start
exploration almost at the same time. In other words, in case of TCP, exploration
is performed in parallel on both sides and takes less time and generates less traffic.
This result also shows that the efficiency of the solutions, like SHIM6, which are
implemented inside the protocol stack may be affected by the behaviour of the other
layers of the protocol stack as well. </t>
<t>We also observed some signs of probe loss in the results. Probe losses can affect
EP, SP, RP and ART. When a probe is lost, it might cause the exploration process to go
to a second round, and then an exponential backoff algorithm causes the exploration
process to take longer and generate more traffic. </t>
</section>
<section anchor="labexpt" title="Lab Experiments">
<t>We repeated similar experiments in the lab. The main difference was RTT which was much
smaller (0.3 ms) than in the Internet experiments. We setup two SHIM6 hosts in the lab, each
equipped with four network interfaces. Thus, in addition to experiments with four
address pairs (similar to the Internet experiments), we could run experiments with 9 and
16 address pairs as well.</t>
<t>In the lab, we got similar results from the TCP and DCCP experiments. Since RTT is small,
DCCP sends ACKs faster, and therefore there is no difference from REAP's viewpoint. </t>
<t>Probe losses are observable in the lab experiments too. Probe loss causes REAP to go to the
second round for scanning the list of address pairs, which leads to sending more probes and
also longer exploration time. </t>
<t>Experiments with 16 address pairs fail when the working address pair is located at or close
to the end of the list of address pairs. REAP employs exponential backoff after sending its initial
probes, to avoid generating large bursts of traffic during exploration. For 16 address pairs,
this delay sometimes causes the connection to time out and stop the experiment. In some cases, SHIM6
removes the context without finding the new address pair. In such cases it seems that packet losses cause
the exploration process to go to the second round of exploration and the resulting longer
delays cause SHIM6 to actually stop exploration and remove the context. </t>
</section>
<section anchor="simul" title="Large scale simulation">
<t>To study the behaviour of REAP in a very large scale network (e.g., an enterprise network),
we built a simulation model of REAP and conducted some experiments which simulated a link
failure event in a network with 10,000 simultaneously active SHIM6-monitored communications.
The aim of the experiments was to see how
REAP reacts to path failures in a large SHIM6-enabled multihomed network. In our practical
tests, nine address pairs seems to be the limit but we have included larger numbers
in our simulations to obtain a clearer view of REAP's behaviour. </t>
<t>We focused on REAP recovery time and probe traffic as two important performance parameters. REAP
recovery time is the time that REAP takes to detect the failure and find a new working address
pair. REAP traffic is the traffic which is generated by REAP itself during its exploration process. </t>
<t>We measured average and total REAP recovery time for different numbers of address pairs for
10,000 instances of REAP. We define total REAP recovery time as the recovery time for the
whole site, i.e., the time between failure occurrence and recovering the last context. In
other words, it shows the recovery time for the last context that is recovered. The average
recovery time is calculated by dividing the sum of recovery times for REAP instances by the
number of REAP instances. It should be noted that recovery time includes failure detection and
address exploration times. </t>
<t>A typical average recovery time for 4 address pairs is 10 to 12 seconds.
The results show that the average and maximum recovery time increase when the number of
address pairs is increased. The correlation is not linear because REAP uses an exponential
backoff algorithm for increasing the time interval between probes. As a result, REAP shows
poor performance when the number of address pairs exceeds 9, for example exceeding 100 seconds
to recover with 16 address pairs. </t>
<t>We also measured the average and total number of probes sent during the address exploration
process in the experiments. The results show that there is a linear correlation between number
of address pairs and number of sent probes. They also show that a large quantity of probes is
sent at the start of exploration. For example, in the case of four address pairs, 93% of the probes,
and in the case of 25 address pairs 34% of probes, are sent during the first 10 seconds. The reason
is that all contexts detect failure within 10 seconds and start exploration by sending initial probes
(the first four probes, which are sent in two seconds). After that, there are some intervals when very
few probes are sent. This can be seen more clearly in the experiments with more address pairs, e.g.
16 or 25 address pairs. This means that for some SHIM6 contexts the time interval between probes is
large, because of the exponential backoff, so REAP instances have to wait for a long time before
probing the next address pair. Some connections might be dropped by the transport or application
layer before REAP can recover them. For example, in case of 25 address pairs, 50% of contexts
need more than five minutes to recover. </t>
<t>Although the peak of the REAP traffic is generated in the first 10 seconds (before employing the exponential
backoff algorithm), our results show that this traffic is small compared to normal traffic for
a large network, and cannot cause a major problem. For example, in the case of 25 address pairs, about
4800 probes per second are sent during the first 10 seconds of the exploration process, which is the
peak of the traffic. Every probe in the first 10 seconds carries at most seven address pairs; four
initial address pairs and three more after employing exponential backoff. Thus, the average probe
size in the first 10 seconds is 232 bytes; each probe needs 72 bytes for the fixed part and 40 bytes
for each address pair. As a result, a load of 4800 probes per second does not occupy more than one MB/s
of the site's available link capacity. Large sites usually have high bandwidth links to the Internet
and this amount of traffic does not cause a significant problem for them. In any case this traffic
will occur at a time when normal traffic from the same sessions has been interrupted. </t>
<t>We also tried two changes to REAP to improve recovery time: Increasing the number of initial probes,
and sending initial probes in parallel. In both cases, we also measured the probe traffic.
The results showed that those
modifications improved recovery time while their effect on the traffic were not big. For example, in
case of nine address pairs, increasing the number of initial probes from four to five caused about 6.5%
increase in traffic in the first 10 seconds of the recovery process, 22% decrease in average recovery
time and 34% decrease in maximum recovery time. Sending initial probes in parallel, in the case of nine
address pairs, caused an 11% decrease in average recovery time, 4.5% decrease in maximum recovery time,
and 8.2% increase in traffic. In both cases, these modifications increased traffic but not to the
level that could not be handled in a large network. </t>
</section>
</section> <!-- shim6res -->
<section anchor="mptcp" title="Results for MPTCP">
<t>MPTCP does not use any specific mechanism for probing paths. In fact, every
subflow runs as a TCP flow and it is the TCP congestion control
mechanism which monitors the used path. When congestion is detected, the
load from the congested path is transferred to other available
paths, if they present less congestion. The MPCTP congestion control
algorithm, known as SEMICOUPLED, reacts to congestion reports
from subflows and adjusts the load on the used paths to achieve performance
and fairness. TCP never sets the congestion window
for a subflow to less than 1. Therefore, even on a highly congested path or
a broken path, it performs the equivalent of probing by setting the congestion
window size to 1, so that any improvements in the path can be detected.
Expiration of the TCP retransmission timer for the subflow on a broken path
triggers sending a segment once in a while, acting as a probe, to ensure a recovery
in the path can be detected. How fast this mechanism can detect an
improvement in a broken path depends on the value of the time-out for this
timer (RTO). The minimum value is usually set to 1 second and
consequent expirations, the case for a broken path, back off the timer
value and multiplies RTO by 2. The traffic generated by this mechanism
in this case is low and may be handled easily, even in a large network. </t>
<t>We simulated MPTCP with up to 8 paths and with RTTs between 80 and 150 ms,
observing the expected behaviour, with the load in the steady state spread
across the paths. When the loss rate of a path is higher, the throughput of
that path is lower. For a given loss rate, a smaller RTT increases throughput
on that path. However, total throughput increases sublinearly with more paths,
due to the way SEMICOUPLED links the congestion windows of the various subflows.
For example, we simulated a scenario in which the steady state throughput
for 8 paths was only about 25% greater than for a single path (Figure 5.10
in <xref target="Naderi14a"/>). This suggests that a scenario with as many
as 8 paths is of limited value in a reasonably reliable network. </t>
<t>We simulated a permanent failure of a single path in a scenario with four paths
in operation. As may be deduced from the previous point, the throughput
recovered in the steady state to within a small percentage of its previous value.
This recovery took about 6 seconds (Figure 5.15 in <xref target="Naderi14a"/>),
which is significantly faster than observed with SHIM6 due to MPTCP's effectively
continuous probing. Simulations of temporary path failures showed that returning
to the original steady state using all paths took a similar time. </t>
<t>Finally we simulated the effect of variable loss rates on MPTCP performance
with two paths operating. We observed that for loss rates varying randomly in the range
up to 1%, MPTCP effectively maintains its steady state throughput. </t>
</section> <!-- mptcp -->
<section anchor="ops" title="Operational issues">
<t>Many if not most site border firewalls today drop packets containing the SHIM6
extension header. In our Internet experiments we had to bypass the site firewall
at both ends. This issue is discussed in <xref target="RFC7045"/>. </t>
<t>Source Address Dependent Routing (SADR) is necessary for effective use of multiple paths.
Without it, packets may be sent to the wrong exit router, or to an ISP that will
immediately discard them due to ingress filtering. With ingress filtering
in place, packets with a given source address may only be sent via an ISP that
accepts packets from that source address. If this is not taken correctly into account
by the source host and by the local routing configuration, the host will waste resources
trying to explore paths that are certain to fail. </t>
</section> <!-- ops -->
<section anchor="future" title="Implications for future designs">
<t>We suggest several conclusions from the above results that should be relevant
to the design of any probing mechanism for exploiting alternative paths between two
hosts: </t>
<t><list style="symbols">
<t>The interaction between round-trip time, the transport layer acknowledgement
mechanism, and the failure detection mechanism is quite subtle and significantly affects the
time taken to start recovery after a failure. </t>
<t>When probing is linked to congestion control, packet loss rates may also
affect recovery times. </t>
<t>Probe traffic is unlikely to cause overload, especially since normal traffic
stops during recovery from failure. </t>
<t>Exponential backoff leads to significantly slower recovery time, and (due
to the previous point) is probably unnecessary. </t>
<t>Probing all alternative paths in parallel leads to significantly faster recovery times
with only a minor increase in the intensity of probe traffic, although this
does occur on the paths that are still carrying normal traffic. However, full sized probe
packets (as used by MPTCP, because they are normal data packets) have more impact
than short probe packets (as used by SHIM6). </t>
<t>The probe packets should resemble normal data packets as much as possible, in order
to avoid being treated specially or dropped by middleboxes such as firewalls or load
balancers.</t>
<t>If Source Address Dependent Routing (SADR) is unavailable, it is better to avoid
probing address pairs that will fail as a result. (Probing all paths in parallel
would in fact mask this problem.) </t>
<t>There is little to be gained by having more than two or three alternative paths. </t>
</list></t>
</section> <!-- security -->
<section anchor="security" title="Security Considerations">
<t>Apart from the need for SHIM6 to bypass firewalls, no security issues were identified
during this work. </t>
</section> <!-- security -->
<section anchor="iana" title="IANA Considerations">
<t>This document requests no action by IANA.</t>
</section> <!-- iana -->
<section anchor="ack" title="Acknowledgements">
<t> </t>
<t>This document was produced using the xml2rfc tool <xref target="RFC2629"/>.</t>
<t>Some text was adapted from <xref target="Naderi14a"/>. </t>
<t>John Ronan from the Telecommunications Software and Systems
Group, Waterford Institute of Technology, and the University of Auckland Information
Technology Services (ITS) helped to run the SHIM6 experiments over the Internet
between Auckland and Dublin.</t>
</section> <!-- ack -->
<section anchor="changes" title="Change log [RFC Editor: Please remove]">
<t>draft-naderi-ipv6-probing-01: editorial improvements, 2015-04-22.</t>
<t>draft-naderi-ipv6-probing-00: original version, 2014-10-21.</t>
</section> <!-- changes -->
</middle>
<back>
<!-- <references title="Normative References">
</references> -->
<references title="Informative References">
&RFC2460;
&RFC5533;
&RFC5534;
&RFC6555;
&RFC7045;
&RFC4960;
&RFC2629;
&RFC6824;
&RFC2827;
&RFC3704;
&RFC4340;
<reference anchor="Barre08">
<front>
<title>LinShim6 - implementation of the Shim6 protocol</title>
<author initials="S." surname="Barre" fullname="S Barre"/>
<date year="2008" month="February"/>
</front>
<seriesInfo name="Technical Report, Universite catholique de Louvain" value=" "/>
</reference>
<reference anchor="Moebius">
<front>
<title>The Moebius framework and its implementation</title>
<author initials="D.D." surname="Deavours" fullname="D. D. Deavours"/>
<author initials="G." surname="Clark" fullname="G. Clark"/>
<author initials="T." surname="Courtney" fullname=" T. Courtney"/>
<author initials="D." surname="Daly" fullname="D. Daly"/>
<author initials="S." surname="Derisavi" fullname="S. Derisavi"/>
<author initials="J. M." surname="Doyle" fullname="J. M. Doyle"/>
<author initials="W. H." surname="Sanders" fullname="W. H. Sanders"/>
<author initials="P. G." surname="Webster" fullname="P. G. Webster"/>
<date year="2002" month="October"/>
</front>
<seriesInfo name="IEEE Transactions on Software Engineering" value="28(10):956-969"/>
</reference>
<reference anchor="Naderi10">
<front>
<title>A Performance Study on REAchability Protocol in Large Scale IPv6 Networks</title>
<author initials="H." surname="Naderi" fullname="Habib Naderi"/>
<author initials="B. E." surname="Carpenter" fullname="Brian E. Carpenter"/>
<date year="2010" month="April"/>
</front>
<seriesInfo name="Second International Conference on Computer and Network Technology (ICCNT 2010), Bangkok" value="28-32"/>
</reference>
<reference anchor="Naderi14a">
<front>
<title>Evaluating and Improving SHIM6 and MPTCP: Two Solutions for IPv6 Multihoming</title>
<author initials="H." surname="Naderi" fullname="Habib Naderi"/>
<date year="2014" month="July"/>
</front>
<seriesInfo name="Ph.D. Thesis, The University of Auckland" value=""/>
</reference>
<reference anchor="Naderi14b">
<front>
<title>Putting SHIM6 into Practice</title>
<author initials="H." surname="Naderi" fullname="Habib Naderi"/>
<author initials="B. E." surname="Carpenter" fullname="Brian E. Carpenter"/>
<date year="2014" month="November"/>
</front>
<seriesInfo name="Australasian Telecommunication Networks and Applications Conference (ATNAC 2014), Melbourne" value=""/>
</reference>
<reference anchor="Wischik10">
<front>
<title>Balancing resource pooling and equipoise in multipath transport</title>
<author initials="D." surname="Wischik" fullname="D. Wischik"/>
<author initials="C." surname="Raiciu" fullname="C. Raiciu"/>
<author initials="M." surname="Handley" fullname="Mark Handley"/>
<date year="2010" month="April"/>
</front>
<seriesInfo name="8th USENIX Symposium on Networked Systems Design and Implementation, San Jose" value=""/>
</reference>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 08:36:49 |