One document matched: draft-naderi-ipv6-probing-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!-- This is built from a template for a generic Internet Draft. Suggestions for
     improvement welcome - write to Brian Carpenter, brian.e.carpenter @ gmail.com -->
<!-- This can be converted using the Web service at http://xml.resource.org/experimental.html
     (which supports the latest, sometimes undocumented and under-tested, features.) -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- You need one entry like the following for each RFC referenced -->
<!ENTITY RFC2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC2460 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2460.xml">
<!ENTITY RFC5533 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5533.xml">
<!ENTITY RFC5534 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5534.xml">
<!ENTITY RFC7045 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7045.xml">
<!ENTITY RFC6555 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6555.xml">
<!ENTITY RFC4960 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4960.xml">
<!ENTITY RFC6824 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6824.xml">
<!ENTITY RFC2827 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2827.xml">
<!ENTITY RFC3704 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3704.xml">

<!-- You need one entry like the following for each I-D referenced -->

<!ENTITY DRAFT-homenet SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-homenet-arch.xml">


]>
<?rfc toc="yes"?>

<!-- You want a table of contents -->
<?rfc symrefs="yes"?>
<!-- Use symbolic labels for references -->
<?rfc sortrefs="yes"?>
<!-- This sorts the references -->
<?rfc iprnotified="no" ?>
<!-- Change to "yes" if someone has disclosed IPR for the draft -->
<?rfc compact="yes"?>
<!-- This defines the specific filename and version number of your draft (and inserts the appropriate IETF boilerplate -->
<rfc category="info" docName="draft-naderi-ipv6-probing-00" ipr="trust200902">
  <front>
    <title abbrev="IPv6 Probing">Experience with IPv6 path probing</title>

    <author fullname="Habib Naderi" initials="H." 
            surname="Naderi">
      <organization abbrev="Univ. of Auckland"></organization>
      <address>
        <postal>
          <street>Department of Computer Science</street>
          <street>University of Auckland</street>
          <street>PB 92019</street>
          <city>Auckland</city>
          <region></region>
          <code>1142</code>
          <country>New Zealand</country>
        </postal>
        <email>habib@cs.auckland.ac.nz</email>
      </address>
    </author>


    <author fullname="Brian Carpenter" initials="B. E." role="editor"
            surname="Carpenter">
      <organization abbrev="Univ. of Auckland"></organization>
      <address>
        <postal>
          <street>Department of Computer Science</street>
          <street>University of Auckland</street>
          <street>PB 92019</street>
          <city>Auckland</city>
          <region></region>
          <code>1142</code>
          <country>New Zealand</country>
        </postal>
        <email>brian.e.carpenter@gmail.com</email>
      </address>
    </author>



    

    <date day="21" month="October" year="2014" />

    <area> </area>

    <workgroup> </workgroup>

    <abstract>
      <t>This document reports on experience and simulations of dynamic probing
      of alternate paths between two IPv6 hosts when network failures occur.
      Two models for such probing were investigated: the Shim6 REAchability
      Protocol (REAP) and the Multipath Transmission Control Protocol (MPTCP).
      The motivation for this document is to identify some aspects of path
      probing at large or very large scale that may be broadly relevant to
      future protocol design.  
      </t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t>A common situation in the Internet today is that a host trying to contact another
         host has a choice of IP addresses for one or both ends of the communication. Multiple 
         addresses are expected to be quite common for IPv6 hosts <xref target="RFC2460"/>. Some
         approaches to this situation envisage either switching paths during the course of the
         communication or using multiple paths in parallel. Examples include "Happy Eyeballs"
         <xref target="RFC6555"/> which tries alternative paths at the start,
         Shim6 <xref target="RFC5533"/> and Stream Control Transmission Protocol (SCTP)
         <xref target="RFC4960"/> which change paths when there is a failure,
         and Multipath TCP (MPTCP) <xref target="RFC6824"/>
         which shares the paths dynamically. </t>

      <t>Some of these methods involve active path probing to choose the best one. Shim6
         probes all available paths using the REAchability Protocol
         (REAP) <xref target="RFC5534"/> when the current path
         fails, and MPTCP effectively probes all paths continuously, and shifts load according
         to the results. In this document we summarise results and observations from
         Shim6 and MPTCP operated or simulated at large scale. These observations may be
         of help in designing future path probing mechanisms. In particular, we are
         interested in minimising both the time taken to recover to the maximum possible
         throughput after a path failure, and  the amount of overhead traffic caused by
         the probing process. </t>

      <t>In summary, we ran a series of Shim6 experiments, each including 250 path failures,
         between Auckland and Dublin, measuring the time and overhead traffic for each instance of
         path probing and recovery. Then we repeated essentially the same experiment in
         the laboratory in Auckland (i.e. with negligible RTT instead of round-the-world
         RTT). Then we built a Stochastic Activity Network (SAN) simulation model of the
         same scenarios, and validated it by comparison with the experimental results.
         Then we used this model to simulate path failure and recovery using REAP at very
         large scale (10,000 simultaneous sessions on a single site experiencing path failure).
         Both TCP and DCCP were used for the transport layer, with a simple application
         sending meaningless data in one direction only. </t>
      
      <t>This was followed by roughly equivalent simulations of recovery from path
         failure for MPTCP sessions. In this case we validated the SAN model by
         comparison with a completely different MPTCP simulator developed elsewhere. </t>

      <t>One advantage of the SAN model is that there are SAN analysis software
         tools which allow very large scale simulations.
         Another is that it makes it relatively easy to experiment with variations
         of the protocol itself, so we did test the impact of certain protocol changes.
         However, unlike conventional network simulation tools, the user has to
         program a complete protocol behaviour model. We used the Moebius tool
         <xref target="Moebius"/>. </t>

      <t>Details of the experiments and results have been described in two papers 
         <xref target="Naderi10"/> <xref target="Naderi14b"/> and in H. Naderi's thesis
         <xref target="Naderi14a"/>. This document limits itself to describing the
         results and their implications for the design of path probing mechanisms in the
         Internet. </t>

    </section> <!-- intro -->

    <section anchor="shim6res" title="Results for Shim6 and REAP">
      

      <section anchor="intexpt" title="Experiments over the Internet">
   <t>We set up a test environment which enabled us to run a set of experiments 
   with LinShim6 <xref target="Barre08"/> over the Internet. We have used two SHIM6-enabled multi-
   addressed hosts, located in the University of Auckland (New Zealand) and 
   Waterford Institute of Technology (Dublin, Ireland). Each host was equipped 
   with two network interface cards and configured with two prefixes from two 
   different providers. The SHIM6 host in Auckland was connected to a router 
   which was a Linux machine and was configured as an IPv6 router. This router 
   simulated link failures for the experiments. </t>
   
   <t>Source Address Dependent Routing (SADR) is necessary for effective use of 
   SHIM6. Hosts decide what source and destination address to use when host-centric 
   solutions, like SHIM6, are used. Without SADR, or similar mechanism for routing, 
   packets might be forwarded to the wrong address providers and dropped because of 
   ingress filtering according to BCP 38 <xref target="RFC2827"/> <xref target="RFC3704"/>.
   Unfortunately, we could not convince the 
   university network administrators to enable SADR on the Auckland University edge router. 
   To run the experiments, they agreed to add static routes to the edge router's 
   routing table, to forward packets destined to the host in Dublin through different 
   providers according to their destination addresses. Therefore, only two address pairs 
   out of four possible address pairs could work. To resolve this issue, we have changed 
   LinShim6 to shuffle the list of address pairs before starting the exploration process 
   in order to put the working address pair in a random location in the list. As a result, 
   the working address pair could appear in any location in the list and create different 
   recovery cases. </t>
   
   <t>This configuration enabled us to run experiments with four address pairs over the 
   Internet. For each experiment, we artificially created 250 failures and for each case
   measured the REAP exploration time (EP), number of sent (SP) and received probes (RP)
   and application recovery time (ART).</t>
   
   <t>Comparing results from experiments with TCP and DCCP shows that when DCCP is employed, 
   EP, SP and RP are bigger than when TCP is used. The main reason for this is that DCCP
   employs delayed acknowledgement. It sends ACKs every RTT (300 ms) while in case of TCP, 
   they are sent faster (less than 100 ms). Since RTT is long, the communications look
   different from REAP's view point although the behaviour of the application is the same in
   both experiments. Since TCP sends ACKs faster, REAP treats it more like a bi-directional 
   communication while DCCP communication is treated more like uni-directional. As a result, 
   in the DCCP experiment, the sender always detects the failure first and then reports it 
   to the receiver, while in the TCP experiment both sides detect failure and start 
   exploration with a small time difference. In other words, in case of TCP, exploration 
   is performed in parallel on both sides and takes less time and generates less traffic. 
   This result also shows that the efficiency of the solutions, like SHIM6, which are 
   implemented inside the protocol stack may be affected by the behaviour of the other 
   layers of the protocol stack as well. </t>
   
   <t>We also observed some signs of probe loss in the results. Probe losses can affect
   EP, SP, RP and ART. When a probe is lost, it might cause the exploration process to go 
   to the second run and employing exponential backoff algorithm causes the exploration 
   process to take longer and generate more traffic. </t>
      </section> 
      <section anchor="labexpt" title="Lab Experiments">
   
   <t>We repeated similar experiments in the lab. The main difference was RTT which was much
   smaller (0.3 ms) than in the Internet experiments. We setup two SHIM6 hosts in the lab, each 
   equipped with four network interfaces. Thus, in addition to experiments with four 
   address pairs (similar to the Internet experiments), we could run experiments with 9 and 
   16 address pairs as well.</t>
   
   <t>In the lab, we got similar results from the TCP and DCCP experiments. Since RTT is small, 
   DCCP sends ACKs faster, therefore there is no difference from REAP's view point. </t>
   
   <t>Probe losses are observable in the lab experiments too. Probe loss causes REAP to go to the 
   second round for scanning the list of address pairs, which leads to sending more probes and 
   also longer exploration time. </t>
   
   <t>Experiments with 16 address pairs fail when the working address pair is located at or close 
   to the end of the list of address pairs. REAP employs exponential backoff after sending initial
   probes to avoid generating large bursts of traffic during exploration. For 16 address pairs, 
   this delay causes the connection to time out and stop the experiment. In some cases, SHIM6 
   removes the context without finding the new address pair. We believe that packet losses cause 
   the exploration process to go to the second round of exploration which causes much longer 
   delays and makes SHIM6 actually stop exploration and remove the context. </t>
      </section>
      <section anchor="simul" title="Large scale simulation">   
   <t>To study the behaviour of REAP in a very large scale network (e.g. an enterprise network),
   we built a simulation model of REAP and conducted some experiments which simulated a link
   failure event in a network with 10,000 simultaneously active SHIM6 monitored communications.
   The aim of the experiments was to see how 
   REAP reacts to path failures in a large SHIM6-enabled multihomed network. In our practical 
   tests, nine address pairs seems to be the limit but we have included larger numbers 
   in our simulations to obtain a clearer view of REAP's behaviour. </t>

   <t>We focused on REAP recovery time and probe traffic as two important performance parameters. REAP 
   recovery time is the time that REAP takes to detect the failure and find a new working address 
   pair. REAP traffic is the traffic which is generated by REAP itself during its exploration process. </t>

   <t>We measured average and total REAP recovery time for different numbers of address pairs for 
   10,000 instances of REAP. We define total REAP recovery time as the recovery time for the 
   whole site, i.e., the time between failure occurrence and recovering the last context. In 
   other words, it shows the recovery time for the last context that is recovered. The average 
   recovery time is calculated by dividing the sum of recovery times for REAP instances by the 
   number of REAP instances. It should be noted that recovery time includes failure detection and 
   address exploration times. </t>
   
   <t>A typical average recovery time for 4 address pairs is 10 to 12 seconds.
   The results show that the average and maximum recovery time increase when the number of
   address pairs is increased. The correlation is not linear because REAP uses an exponential
   backoff algorithm for increasing the time interval between probes. As a result, REAP shows
   poor performance when the number of address pairs exceeds 9, for example exceeding 100 seconds
   to recover with 16 address pairs. </t>
   
   <t>We also measured the average and total number of probes sent during the address exploration 
   process in the experiments. The results show that there is a linear correlation between number 
   of address pairs and number of sent probes. They also show that a large amount of probes are 
   sent at the start of exploration. For example, in the case of four address pairs, 93% of the probes, 
   and in the case of 25 address pairs 34% of probes, are sent during the first 10 seconds. The reason 
   is that all contexts detect failure within 10 seconds and start exploration by sending initial probes 
   (first four probes which are sent in two seconds). After that, there are some intervals when very 
   few probes were sent. This can be seen more clearly in the experiments with more address pairs, e.g. 
   16 or 25 address pairs. It means that for some SHIM6 contexts the time interval between probes is 
   large because of the exponential backoff, so REAP instances have to wait for a long time before 
   probing the next address pair. Some connections might be dropped by the transport or application 
   layer before REAP can recover them. For example, in case of 25 address pairs, 50% of contexts 
   need more than five minutes to recover. </t>
   
   <t>Although the peak of the REAP traffic is generated in the first 10 seconds (before employing the exponential
   backoff algorithm), our results show that this traffic is small compared to normal traffic for 
   a large network, and cannot cause a major problem. For example, in the case of 25 address pairs, about 
   4800 probes per second are sent during the first 10 seconds of the exploration process, which is the 
   peak of the traffic. Every probe in the first 10 seconds carries at most seven address pairs; four 
   initial address pairs and three more after employing exponential backoff. Thus, the average probe 
   size in the first 10 seconds is 232 bytes; each probe needs 72 bytes for the fixed part and 40 bytes 
   for each address pair. As a result, a load of 4800 probes per second does not occupy more than one MB/s 
   of the site's available link capacity. Large sites usually have high bandwidth links to the Internet 
   and this amount of traffic does not cause a significant problem for them. In any case this traffic
   will occur at a time when normal traffic from the same sessions has been interrupted. </t>
   
   <t>We also tried two changes to REAP to improve recovery time: Increasing the number of initial probes,
   and sending initial probes in parallel. In both cases, we also measured the probe traffic.
   The results showed that those 
   modifications improved recovery time while their effect on the traffic were not big. For example, in 
   case of nine address pairs, increasing the number of initial probes from four to five caused about 6.5% 
   increase in traffic in the first 10 seconds of the recovery process, 22% decrease in average recovery 
   time and 34% decrease in maximum recovery time. Sending initial probes in parallel, in the case of nine 
   address pairs, caused an 11% decrease in average recovery time, 4.5% decrease in maximum recovery time, 
   and 8.2% increase in traffic. In both cases, these modifications increased traffic but not to the 
   level that could not be handled in a large network. </t>
   
      </section>
   
    </section> <!-- shim6res -->

    <section anchor="mptcp" title="Results for MPTCP">
      <t>MPTCP does not use any specific mechanism for probing paths. In fact, every
      subflow runs as a TCP flow and it is the TCP congestion control
      mechanism which monitors the used path. When congestion is detected, the
      load from the congested path is transferred to other available
      paths, if they present less congestion. The MPCTP congestion control
      algorithm, known as SEMICOUPLED, reacts to congestion reports
      from subflows and adjust the load on the used paths to achieve performance
      and fairness. TCP never sets the congestion window
      for a subflow to less than 1. Therefore, even on a highly congested path or
      a broken path, it performs the equivalent of probing by setting the congestion
      window size to 1, so that any improvements in the path can be detected.
      Expiration of the TCP retransmission timer for the subflow on a broken path
      triggers sending a segment once in a while, acting as a probe, to ensure a recovery
      in the path can be detected. How fast this mechanism can detect an
      improvement in a broken path depends on the value of the time-out for this
      timer (RTO). The minimum value is usually set to 1 second and
      consequent expirations, the case for a broken path, back off the timer
      value and multiplies RTO by 2. The traffic generated by this mechanism
      in this case is low and may be handled easily, even in a large network. </t>
 
      <t>We simulated MPTCP with up to 8 paths and with RTTs between 80 and 150 ms,
      observing the expected behaviour with the load in the steady state spread
      across the paths. When the loss rate of a path is higher, the throughput of
      that path is lower. For a given loss rate, a smaller RTT increases throughput
      on that path. However, total throughput increases sublinearly with more paths,
      due to the way SEMICOUPLED links the congestion windows of the various subflows.
      For example, we simulated a scenario in which the steady state throughput
      for 8 paths was only about 25% greater than for a single path (Figure 5.10
      in <xref target="Naderi14a"/>). This suggests that a scenario with as many
      as 8 paths is of limited value in a reasonably reliable network. </t>

      <t>We simulated a permanent failure of a single path in a scenario with four paths
      in operation. As may be deduced from the previous point, the throughput
      recovered in the steady state to within a small percentage of its previous value.
      This recovery took about 6 seconds (Figure 5.15 in <xref target="Naderi14a"/>),
      which is significantly faster than observed with SHIM6 due to MPTCP's effectively
      continuous probing. Simulations of temporary path failures showed that returning
      to the original steady state using all paths took a similar time. </t>

      <t>Finally we simulated the effect of variable loss rates on MPTCP performance
      with two paths operating. We observed that for loss rates varying randomly in the range
      up to 1%, MPTCP effectively maintains its steady state throughput. </t>
      

    </section> <!-- mptcp -->

    <section anchor="ops" title="Operational issues">
      <t>Many if not most site border firewalls today drop packets containing the Shim6
         extension header. In our Internet experiments we had to bypass the site firewall
         at both ends. This issue is discussed in <xref target="RFC7045"/>. </t>

      <t>Source Address Dependent Routing (SADR) is necessary for effective use of multiple paths.
         Without it, packets may be sent to the wrong exit router, or to an ISP that will
         immediately discard them due to ingress filtering. With ingress filtering
         in place, packets with a given source address may only be sent via an ISP that
         accepts packets from that source address. If this is not taken correctly into account
         by the source host and by the local routing configuration, the host will waste resources
         trying to explore paths that are certain to fail. </t>
   
    </section> <!-- ops -->

    <section anchor="future" title="Implications for future designs">
      <t>We suggest several conclusions from the above results that should be relevant
      to the design of any probing mechanism for exploiting alternative paths between two
      hosts: </t>
      <t><list style="symbols">
      <t>The interaction between round-trip time, the transport layer acknowledgement
      mechanism, and the failure detection mechanism significantly affects the
      time taken to start recovery after a failure. </t>
      <t>When probing is linked to congestion control, packet loss rates may also
      affect recovery times. </t>
      <t>Probe traffic is unlikely to cause overload, especially since normal traffic
      stops during recovery from failure. </t>
      <t>Exponential backoff leads to significantly slower recovery time, and (due
      to the previous point) is probably unnecessary. </t>
      <t>Probing all alternative paths in parallel leads to significantly faster recovery times
      with only a minor increase in the intensity of probe traffic, although this
      does occur on the paths that are still carrying normal traffic. However, full sized probe
      packets (as used by MPTCP, because they are normal data packets) have more impact
      than short probe packets (as used by SHIM6). </t>
      <t>The probe packets should resemble normal data packets as much as possible, in order
      to avoid being treated specially or dropped by middleboxes such as firewalls or load
      balancers.</t>
      <t>If Source Address Dependent Routing (SADR) is unavailable, it is better to avoid
      probing address pairs that will fail as a result. (Probing all paths in parallel
      would in fact mask this problem.)  </t>
      <t>There is little to be gained by having more than two or three alternative paths. </t>
      </list></t>
   
    </section> <!-- security -->


    <section anchor="security" title="Security Considerations">
      <t>Apart from the need for Shim6 to bypass firewalls, no security issues are raised by this work. </t>
   
    </section> <!-- security -->

    <section anchor="iana" title="IANA Considerations">
      <t>This document requests no action by IANA.</t>
    </section> <!-- iana -->

    <section anchor="ack" title="Acknowledgements">
      <t> </t>

      <t>This document was produced using the xml2rfc tool <xref target="RFC2629"/>.</t>

      <t>Some text was adapted from <xref target="Naderi14a"/>. </t>

      <t>John Ronan from the Telecommunications Software and Systems
      Group, Waterford Institute of Technology, and the University of Auckland Information
      Technology Services (ITS) helped to run the SHIM6 experiments over the Internet
      between Auckland and Dublin.</t>

    </section> <!-- ack -->

    <section anchor="changes" title="Change log [RFC Editor: Please remove]">

      <t>draft-naderi-ipv6-probing-00: original version, 2014-10-21.</t>
    </section> <!-- changes -->

  </middle>

  <back>
    <!-- <references title="Normative References">

    </references> -->

    <references title="Informative References">
      &RFC2460;
      &RFC5533;
      &RFC5534;
      &RFC6555;
      &RFC7045;
      &RFC4960;
      &RFC2629;
      &RFC6824;
      &RFC2827;
      &RFC3704;

<reference anchor="Barre08">
<front>
<title>LinShim6 - implementation of the Shim6 protocol</title>
<author initials="S." surname="Barre" fullname="S Barre"/> 
<date year="2008" month="February"/>
</front>
<seriesInfo name="Technical Report, Universite catholique de Louvain" value=" "/>
</reference>

<reference anchor="Moebius">
<front>
<title>The Moebius framework and its implementation</title>
<author initials="D.D." surname="Deavours" fullname="D. D. Deavours"/> 
<author initials="G." surname="Clark" fullname="G. Clark"/>
<author initials="T." surname="Courtney" fullname=" T. Courtney"/>
<author initials="D." surname="Daly" fullname="D. Daly"/>
<author initials="S." surname="Derisavi" fullname="S. Derisavi"/>
<author initials="J. M." surname="Doyle" fullname="J. M. Doyle"/>
<author initials="W. H." surname="Sanders" fullname="W. H. Sanders"/>
<author initials="P. G." surname="Webster" fullname="P. G. Webster"/>
<date year="2002" month="October"/>
</front>
<seriesInfo name="IEEE Transactions on Software Engineering" value="28(10):956-969"/>
</reference>
      
<reference anchor="Naderi10">
<front>
<title>A Performance Study on REAchability Protocol in Large Scale IPv6 Networks</title>
<author initials="H." surname="Naderi" fullname="Habib Naderi"/> 
<author initials="B. E." surname="Carpenter" fullname="Brian E. Carpenter"/>
<date year="2010" month="April"/>
</front>
<seriesInfo name="Second International Conference on Computer and Network Technology 
(ICCNT 2010), Bangkok" value="28-32"/>
</reference>

<reference anchor="Naderi14a">
<front>
<title>Evaluating and Improving SHIM6 and MPTCP: Two Solutions for IPv6 Multihoming</title>
<author initials="H." surname="Naderi" fullname="Habib Naderi"/> 
<date year="2014" month="July"/>
</front>
<seriesInfo name="Ph.D. Thesis, The University of Auckland" value=""/>
</reference>

<reference anchor="Naderi14b">
<front>
<title>Putting SHIM6 into Practice</title>
<author initials="H." surname="Naderi" fullname="Habib Naderi"/> 
<author initials="B. E." surname="Carpenter" fullname="Brian E. Carpenter"/>
<date year="2014" month="November"/>
</front>
<seriesInfo name="Australasian Telecommunication Networks and Applications Conference (ATNAC 2014),
Melbourne" value=""/>
</reference>

    </references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 08:57:46