One document matched: draft-paasch-mptcp-loadbalancer-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
     An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC6824 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6824.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="exp" docName="draft-paasch-mptcp-loadbalancer-00" ipr="trust200902">
  <!-- category values: std, bcp, info, exp, and historic
     ipr values: full3667, noModification3667, noDerivatives3667
     you can add the attributes updates="NNNN" and obsoletes="NNNN" 
     they will automatically be output with "(if approved)" -->

  <front>
    <title abbrev="Multipath TCP loadbalancers">Multipath TCP behind Layer-4 loadbalancers</title>

    <!-- add 'role="editor"' below for the editors if appropriate -->

    <!-- Another author who claims to be an editor -->
    <author fullname="Christoph Paasch" initials="C.P." surname="Paasch">
      <organization>Apple, Inc.</organization>
      <address>
        <postal>
          <street></street>
          <city>Cupertino</city>
          <region></region>
          <code></code>
          <country>US</country>
        </postal>
        <email>cpaasch@apple.com</email>
      </address>
    </author>

    <author fullname="Greg Greenway" initials="G.G." surname="Greenway">
      <organization>Apple, Inc.</organization>
      <address>
        <postal>
          <street></street>
          <city>Cupertino</city>
          <region></region>
          <code></code>
          <country>US</country>
        </postal>
        <email>ggreenway@apple.com</email>
      </address>
    </author>

    <author fullname="Alan Ford" initials="A.F." surname="Ford">
      <organization>Pexip</organization>
      <address>
        <email>alan.ford@gmail.com</email>
      </address>
    </author>


    <date year="2015" />
    <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
         in the current day for you. If only the current year is specified, xml2rfc will fill 
	 in the current day and month for you. If the year is not the current one, it is 
	 necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
	 purpose of calculating the expiry date).  With drafts it is normally sufficient to 
	 specify just the year. -->

    <area>General</area>

    <workgroup>MPTCP Working Group</workgroup>

    <!-- WG name at the upperleft corner of the doc,
         IETF is fine for individual submissions.  
	 If this element is not present, the default is "Network Working Group",
         which is used by the RFC Editor as a nod to the history of the IETF. -->

    <!-- Keywords will be incorporated into HTML output
         files in a meta tag but they have no effect on text or nroff
         output. If you submit your draft to the RFC Editor, the
         keywords will be used for the search engine. -->

<abstract>
<t>
Large webserver farms consist of thousands of frontend proxies that serve as
endpoints for the TCP and TLS connection and relay traffic to the (sometimes distant)
backend servers. Load-balancing across
those server is done by layer-4 loadbalancers that ensure that a TCP flow will
always reach the same server.
</t>
<t>
Multipath TCP's use of multiple TCP subflows for the transmission of the data stream
requires those loadbalancers to be aware of MPTCP to ensure that all subflows
belonging to the same MPTCP connection reach the same frontend proxy.
In this document we analyze the challenges related to this and suggest a simple
modification to the generation of the MPTCP-token to overcome those challenges.
</t>
</abstract>
  </front>

<middle>
<section title="Introduction">
<t>
Internet services rely on large server farms to deliver content to the end-user.
In order to cope with the load on those server farms they rely on a large,
distributed load-balancing architecture at different layers. Backend servers
are serving the content from within the data center to the frontend proxies.
These frontend proxies are the ones terminating the TCP connections from the clients.
A server farm relies on a large number of these frontend proxies to provide sufficient
capacity. In order to balance the load on those frontend proxies, layer-4 loadbalancers
are installed in front of these. Those loadbalancers ensure that a TCP-flow will
always be routed to the same frontend proxy. For resilience and capacity reasons
the data-center typically deploys multiple of these loadbalancers <xref target="Shuff13"/> <xref target="Patel13"/>.
</t>

<t>
These layer-4 loadbalancers rely on consistent hashing algorithms 
to ensure that a TCP-flow is routed to the appropriate frontend
proxy. The consistent hashing algorithm avoids state-synchronization
across the loadbalancers, making sure that in case a TCP-flow gets routed to
a different loadbalancer (e.g., due to a change in routing) the TCP-flow will
still be sent to the appropriate frontend proxy <xref target="Greenberg13"/>.
</t>

<t>
Multipath TCP uses different TCP flows and spreads the application's data stream
across these <xref target="RFC6824"/>. These TCP subflows use a different 4-tuple in order to be routed
on a different path on the Internet. However, legacy layer-4 loadbalancers
are not aware that these different TCP flows actually belong to the same
MPTCP connection. 
</t>

<t>
The remainder of this document explains the issues that arise
due to this and suggests a possible change to MPTCP's token-generation algorithm
to overcome these issues.
</t>

</section>

<section anchor="problem" title="Problem statement">
<t>
In an architecture with a single layer-4 loadbalancer but multiple frontend proxies,
the layer-4 loadbalancer will have to make sure that the different TCP subflows
that belong to the same MPTCP connection are routed to the same frontend proxy.
In order to achieve this, the loadbalancer has to be made "MPTCP-aware", tracking
the keys exchanged in the MP_CAPABLE handshake. This state-tracking allows the
loadbalancer to also calculate the token associated with the MPTCP-connection. The
loadbalancer thus creates a mapping (token, frontend proxy), stored in memory
for the lifetime of the MPTCP connection. As new TCP subflows are being created 
by the client, the token included in the SYN+MP_JOIN message allows the loadbalancer 
to ensure that this subflow is being routed to the appropriate frontend proxy.
</t>

<t>
However, as soon as the data center employs multiple of these layer-4 loadbalancers,
it may happen that TCP subflows that belong to the same MPTCP connection are being
routed to different loadbalancers. This implies that the loadbalancer needs to
share the mapping-state it created for all MPTCP connections among all other
loadbalancers to ensure that all loadbalancers route the subflows of an MPTCP connection
to the same frontend proxy. This is substantially more complicated to implement, and would 
suffer from latency issues.
</t>

<t>
Another issue when MPTCP is being used in a large server farm is that the different
frontend proxies may generate the same token for different MPTCP connections. This may
happen because the token is a truncated hash of the key, and hash collisions
may occur. A server farm handling millions of MPTCP connections has actually
a very high chance of generating those token-collisions. A loadbalancer will thus
no more be able to accurately send the SYN+MP_JOIN to the correct frontend proxy
in case a token-collision happened for this MPTCP connection.
</t>

</section>

<section anchor="proposal" title="Proposals">

<t>
The issues described in <xref target="problem"/> have their origin due to the
undeterministic nature in the token-generation. Indeed, if it becomes possible
for the loadbalancer to infer the frontend proxy to forward this flow to, MPTCP
becomes deployable in such kinds of environments.
</t>

<t>
The suggested solutions have their basis in a token from which a loadbalacer
can glean routing information in a stateless manner. To allow
the loadbalancer to infer the proxy based on the token, the proxies each need
to be assigned to a range of unique integers. When the token falls within a certain
range, the loadbalancer knows to which proxy to forward the sufblow.
Using a contiguous range of integers makes the frontend very vulnerable to attackers.

Thus, a reversible function is needed that makes the token random-looking. A 32-bit
block-cipher (e.g., RC5) provides this random-looking reversible function. Thus, for both
proposals we assume that the frontend proxies and the layer-4 loadbalancer share
a local secret Y, of size 32 bits. This secret is only known to the server-side
data center infrastructure. If X is an integer from within the range associated
to the proxy, the proxy will generate the token by encypting X with secret Y.
The loadbalancer will simply decrypt the token with the secret Y, which provides
it the value of X, allowing it to forward the TCP flow to the appropriate proxy.
</t>

<t>
This approach also ensures that the tokens generated by different servers are
unique to each server, eliminating the token-collision issue outlined in the
previous section.
</t>

<t>
In the following we outline two different approaches to handle the above described
problems, using this approach. The two proposals provide different
ways of communicating the token over to the peer during the MP_CAPABLE handshake.
We would like these proposals to serve as a discussion basis for the
design of the definite solution.
</t>

<section anchor="token" title="Explicitly announcing the token">

<t>
One way of communicating the token to simply announce it in plaintext within the MP_CAPABLE handshake.
In order to allow this, the wire-format of the MP_CAPABLE handshake needs to change however.
</t>

<t>
One solution would be to simply increase the size of the MP_CAPABLE by 4 bytes,
giving space for the token to be included in the SYN and SYN/ACK as well as adding
it to the third ACK. However, due to the scarce TCP-option space this solution
would suffer deployment difficulties.
</t>

<t>
If the solution proposed in <xref target="I-D.paasch-mptcp-syncookies"/> is being
deployed, the MP_CAPABLE-option in the SYN-segment has been reduced to 4 bytes.
This gives us space within the option-space of the SYN-segment that can be used.
This allows the client to announce its token within the SYN-segment.
To allow the server to announce its token in the SYN/ACK, without bumping the
option-size up to 16 bytes, we reduce the size of the server's key down to 32 bits,
which gives space for the server's token.
To avoid introducing security-risks by reducing the size of the server's key,
we suggest to bump the client's key up to 96 bits. This provides still a
total of 128 bits of entropy for the HMAC computation. The suggested
handshake is outlined in <xref target="explicit_fig"/>.
</t>

<figure align="center" anchor="explicit_fig">
<artwork align="center"><![CDATA[
        SYN + MP_CAPABLE_SYN (Token_A)
    ------------------------------------->
      (the client announces the 4-byte locally
       unique token to the server in the
       SYN-segment).


       SYN/ACK + MP_CAPABLE_SYNACK (Token_B, Key_B)
    <-------------------------------------
      (the server replies with a SYN/ACK announcing
       as well a 4-byte locally unique token and a 4-byte key)


       ACK + MP_CAPABLE_ACK (Key_A, Key_B)
    -------------------------------------->
       (third ack, the client replies with a 12-byte Key_A
        and echoes the 4-byte Key_B as well).
]]></artwork>
<postamble>The suggested handshake explicitly announces the token.</postamble>
</figure>

<t>
Reducing the size of the server's key down to 32 bits might be considered a security risk.
However, one might argue that neither parties involved in the handshake (client and server)
have an interest in compromising the connection. Thus, the server can have confidence
that the client is going to generate a 96 bits key with sufficient entropy and
thus the server can safely reduce its key-size down to 32 bits.
</t>

<t>
However, this would require the server to act statefully in the SYN exhcnage if it wanted to
be able to open connections back to the client, since the token never appears again in the
handshake.
</t>

</section>

<section anchor="block" title="Changing the token generation">
<t>
Another suggestion is based on a less drastic change to the MP_CAPABLE handshake.
We suggest to infer the token based on the key provided by the host. However,
in contrast to <xref target="RFC6824"/>, the token is not a truncated hash of the
keys. The token-generation uses rather the following scheme:
If we define Z as the 32 high-order bits and K the
32 low-order bits of the MPTCP-key generated by a host, we suggest to generate
the token as the encryption of Z with key K by using a 32-bit block-cipher (the
block-cipher may for example be RC5 - it remains to be defined by the working-group which
is an appropriate block-cipher to use for this case). The size of the MPTCP-key
remains unchanged and is actually the concatenation of Z with K.
Both, K and Z are different for each and every connection, thus the MPTCP-key
still provides 64 bits of randomness.
</t>

<t>
Using this approach, a frontend proxy can make sure that a loadbalancer can
derive the identity of the backend server solely through the token in the SYN-segment
of the MP_JOIN exchange, without the need to track any MPTCP-related state.
To achieve this, the frontend proxy needs to generate K and Z in a specific way.
Basically, the proxy derives the token through the method described at the
beginning of this <xref target="proposal"/>. This gives us the following relation:
</t>

<t>
token = block_cipher(proxy_id, Y) (Y is the local secret)
</t>

<t>
However, as described above, at the same time we enforce:
</t>

<t>
token = block_cipher(Z, K)
</t>

<t>
Thus, the proxy simply generates a random number K, and can thus generate Z by decrypting the token with key K.
It is TBD what number of bits of a token could be used for conveying routing information. Exlcuding those bits, 
the token would be random, and the key K is random as well, so Z will be random as well. An attacker evesdropping 
the token cannot infer anything on Z nor on K. However, prolonged gathering of token data could lead to building
up some data about the key K.
</t>

</section>
</section>

<section anchor="concl" title="Conclusion">
<t>
In order to be deployable at a large scale,  Multipath TCP has to evolve to accomodate
the use-case of distributed layer-4 loadbalancers. In this document we explained
the different problems that arise when one wants to deploy MPTCP in a large
server farm. We followed up with two possible approaches to solve the issues
around the non-deterministic nature of the token. We argue that it is important
that the working group considers this problem and strives to find a solution.
</t>
</section>

<section anchor="IANA" title="IANA Considerations">
<t>
No IANA considerations.
</t>
</section>

<!---section anchor="security" title="Security Considerations">
<t>
</t>
</section-->

</middle>

  <back>
    <!-- References split into informative and normative -->

    <!-- There are 2 ways to insert reference entries from the citation libraries:
     1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
     2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
        (for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")

     Both are cited textually in the same manner: by using xref elements.
     If you use the PI option, xml2rfc will, by default, try to find included files in the same
     directory as the including file. You can also define the XML_LIBRARY environment variable
     with a value containing a set of directories to search.  These can be either in the local
     filing system or remote ones accessed by http (http://domain/dir/... ).-->

    <references title="Normative References">
    &RFC6824;
    <?rfc include='reference.I-D.paasch-mptcp-syncookies.xml'?>
    </references>

    <references title="Informative References">
      <reference anchor="Shuff13"
                 target="https://www.youtube.com/watch?v=MKgJeqF1DHw">
        <front>
          <title>Building A Billion User Load Balancer</title>
          <author fullname="Patrick Shuff" initials="P.S." surname="Shuff"/>
          <date year="2013" />
	  <workgroup>@Scale x13</workgroup>
        </front>
      </reference>
      <reference anchor="Patel13"
                 target="http://dl.acm.org/citation.cfm?id=2486026">
        <front>
          <title>Ananta: Cloud Scale Load Balancing</title>
          <author fullname="Parveen Patel" initials="P.P." surname="Parveen"/>
          <author fullname="Deepak Bansal" initials="D.B." surname="Bansal"/>
	  <author fullname="Lihua Yuan" initials="L.Y." surname="Yuan"/>
	  <author fullname="Ashwin Murthy" initials="A.M." surname="Murthy"/>
	  <author fullname="David A. Maltz" initials="D.M." surname="Maltz"/>
	  <author fullname="Randy Kern" initials="R.K." surname="Kern"/>
	  <author fullname="Hermant Kumar" initials="H.K." surname="Kumar"/>
	  <author fullname="Marios Zikos" initials="M.Z." surname="Zikos"/>
	  <author fullname="Hongyu Wu" initials="H.W." surname="Wu"/>
	  <author fullname="Changhoon Kim" initials="C.K." surname="Kim"/>
	  <author fullname="Naveen Karri" initials="N.K." surname="Karri"/>
          <date year="2013"/>
	  <workgroup>ACM SIGCOMM</workgroup>
        </front>
      </reference>
      <reference anchor="Greenberg13"
                 target="http://dl.acm.org/citation.cfm?id=1397732">
        <front>
          <title>Towards a Next Generation Data Center Architecture: Scalability and Commoditization</title>
	  <author fullname="Albert Greenberg" initials="A.G." surname="Greenberg"/>
	  <author fullname="Parantap Lahiri" initials="P.L." surname="Lahiri"/>
	  <author fullname="David A. Maltz" initials="D.M." surname="Maltz"/>
          <author fullname="Parveen Patel" initials="P.P." surname="Parveen"/>
	  <author fullname="Sudipta Sengupta" initials="S.S." surname="Sengupta"/>
          <date year="2018"/>
	  <workgroup>PRESTO'08 ACM workshop</workgroup>
        </front>
      </reference>
    </references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 02:57:15