One document matched: draft-yourtchenko-nat-reveal-hash-00.xml


<?xml version='1.0' encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd' [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc1323 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1323.xml">
<!ENTITY rfc5925 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5925.xml">
<!ENTITY rfc1948 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1948.xml">
<!ENTITY rfc4987 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4987.xml">
<!ENTITY I-D.ietf-intarea-shared-addressing-issues SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-intarea-shared-addressing-issues">
<!ENTITY I-D.ietf-tcpm-tcp-timestamps SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-tcpm-tcp-timestamps">
<!ENTITY I-D.ietf-tsvwg-port-randomization SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-tsvwg-port-randomization">
]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc='yes'?>
<?rfc rfcprocack="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc sortrefs="yes" ?>
<?rfc colonspace='yes' ?>
<?rfc tocindent='yes' ?>

<rfc category="std" docName="draft-yourtchenko-nat-reveal-hash-00" ipr="trust200902">
<front>
<title abbrev='Revealing the hosts behind NAPT'>NAT confessions: revealing the hosts behind the translator</title>

<author initials='A.' surname='Yourtchenko' fullname='Andrew Yourtchenko'>
<organization>cisco</organization>

<address>
<postal>
<street>6a de Kleetlaan</street>
<city>Diegem</city><code>1831</code>
<country>BE</country>
</postal>

<phone>+32 2 704 5494</phone>
<email>ayourtch@cisco.com</email>
</address>
</author>

<author initials='D.' surname='Wing' fullname='Dan Wing'>
<organization>cisco</organization>

<address>
<postal>
<street>170 West Tasman Drive</street>
<city>San Jose</city><code>CA 95134</code>
<country>USA</country>
</postal>

<email>dwing@cisco.com</email>
</address>
</author>


<date year='2010' />

<workgroup></workgroup>


<abstract>
<t>
When an IP address is shared among several subscribers, it is impossible to determine which subscriber has initiated that TCP connection.  This memo describes a technique to share the identity of a subscriber that initiated a TCP connection with the TCP server.. The proposed method avoids altering the application-level payload and works well with SSL-protected connections.
</t>
</abstract>
</front>
<middle>

<section title='Introduction'>
<t>
There are several scenarios where it is valuable to know the identity of a TCP client, including geolocation, DoS blocking, and spam blacklists. Today, this is done by equating IPv4 address with 'identity'.  However, the identity of a TCP client is obscured when an IP address is shared <xref target="I-D.ietf-intarea-shared-addressing-issues">I-D.ietf-intarea-shared-addressing-issues</xref>.  IP address sharing is done by both network address and port translators (NAPT) and by application-layer proxies (e.g., HTTP or FTP proxies).
</t>
<t>
The current state of the art requires the address sharing alter the application-level payload and include the identity of the internal host -- usually the internal host's private IP address.  This incurs several drawbacks,<list style="symbols">
<t>adjustment of TCP sequence numbers and acknowledgement numbers for the duration of the TCP session</t><t>risk of false-positive application matching (e.g., accidentally inserting an HTTP header into a non-HTTP payload).</t><t>interference with application payload by increasing packet size (e.g., MTU)</t></list>
 With SSL-protected applications the current state of the art requires breaking the end-to-end encrypted connection. This results in several undesirable consequences:<list style="symbols">
<t>necessity for the translator to break the end-to-end encryption, typically by installing an addional Certificate Authority on the client's CA trust list</t><t>noticeable increase in the processing power required on the address sharing device to decrypt and re-encrypt that application payload</t></list>

</t>
<t>
This specification avoids the problems described above, and defines the method of communicating the TCP client's identity to the TCP server by overloading the TCP timestamp field and IP Identifier field of the initial TCP SYN.  
</t>
<t>
This extension is necessary because IP address sharing, deployed by NAT64 devices, will allow malicious users to connect to IPv4-capable servers.  Thus, until a server is only accessible via IPv6 (and inaccessible via IPv4), the IPv4-capable server will suffer from an inability to identify individual TCP clients as discussed in <xref target="I-D.ietf-intarea-shared-addressing-issues">I-D.ietf-intarea-shared-addressing-issues</xref>.
</t>
</section>
<section title='Notational Conventions'>
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in <xref target="RFC2119">RFC2119</xref>.
</t>
</section>
<section title='Description'>
<t>
This proposal leverages the common deployment of  TCP timestamps and that a timestamp-aware TCP server will echo the timestamp..
</t>
<t>
The caveat with the above is that the remote peer must know in advance if the TCP client implements this technique or not -- the timestamp on the server side looks just the same. This could be resolved by manual configuration but that is impractical, so an automatic detection mechanism is proposed. The automatic mechanism  calculates a hash over the values of interest and placing the result into another field. The receiver can then perform the same operation and verify. If the received and computed values match, then the TCP timestamp received does contain the encoded internal address. The verifier value is computed as a hash function over the mapped value encoded into the timestamp, address after translation, and the TCP initial sequence number - i.e. the sequence number within the SYN segment. The usage of the TCP initial sequence number allows to avoid the verifier value being almost always the same. The reason for doing so is to satisfy the protocol constraints of the field that is used to convey this value.
</t>
<t>
In order to find some place for storing this verification value, we make another observation: TCP SYN segments are generally rather small, and the minimum MTU on IPv4 is 576. Typical stacks send the TCP SYN with DF=1. Therefore, they would never be fragmented. This means we could use the 16-bit value of the IP ID to put the verifier value in. The verifier is dependent on the initial sequence number (ISN) -- which is should have some randomness properties as described in <xref target="RFC1948">RFC1948</xref>, therefore the IP ID will be reasonably different to still serve its purpose even in the extremely unlikely case that the TCP SYN is fragmented.
</t>
<t>
Using a 16-bit value as a verifier gives 1 in 65536 chances (or, 0.0015%) probability of erroneously judging that the timestamp contains the encoded internal address. This may be insufficient assurance for some of the scenarios. Therefore, we calculate the verifier (referred to as VFY value) to be a 32-bit integer - and store 16 or more bits of this value - at the expense of storing less bits of Internal Address Mapping (iAM). However, we expect that the range of iAM for a single public translation would be relatively small - so, no information will be lost in this process.
</t>
</section>
<section title='Calculating the Internal Address Mapping'>
<t>
The main useful property of iAM is that it MUST stay the same for the same internal address unless the configuration on the translator has changed. Since the goal is to provide the stable mapping, rather than fully reveal the internal address, any method that has this property is acceptable - and the choice of it is left to the implementors of the translator. If the addresses to be translated are configured as a prefix, then the iAM can be obtained just by taking the host bits of the address within the prefix. If the assignment of these addresses is on an individual basis, then the simple enumeration might be used. If the internal addresses are assigned to the pool as set of subnets - then the combination of the two methods above (the host bits in the least significant part, and the enumeration in the most significant part) will give good results. This also stimulates allocation of the internal address in equal-sized chunks, which should make the maintenance of the network easier.
</t>
<t>
As a result, the calculation of the iAM on the outgoing SYN segment MUST return two values:
</t>
<t>
<list style="symbols">
<t>iAM = Internal Address Mapping: a 32-bit unsigned integer</t><t>siAM = Size of Internal Address Mapping, in bits: integer, allowed range 9..24 - this is the number of significant bits within the iAM.</t></list>

</t>
<t>
The minimum value of siAM being 9 was chosen based on the following logic:<list style="symbols">
<t>having a room of 512 possible hosts allows to keep the property of iAM to not change during the smaller configuration changes, in case the pool is made up of individual hosts.</t><t>the range 9..24 has exactly 16 possible values, which will be useful for encoding.</t></list>

</t>
<t>
By encoding only the significant bits of the internal address mapping the operator of the translator can minimize the probability of the error - all the unused bits are allocated for the value used to "fingerprint" the presence of the internal identifier. The more bits this "Verifier" value can contain - the less is the chance of accidental match - and erroneous record of the internal identifier when there is none.
</t>
<t>
The range from 9 bits to 24 bits allows to encode between 512 and 16777216 internal identifiers for a single public IP address.
</t>
</section>
<section title='Calculating the Verifier'>
<t>
The verifier is calculated as a 32-bit result of a hash function. This hash function is not expected to be cryptographically strong (the 'Security considerations' section explains why), however it should have good distribution, good collision resistance, good avalanche behavior and be fast and cheap to compute. These properties are satisfied by <xref target="URL.Murmur-hash">Murmur hash</xref> function, therefore it is the hash that we will use.
</t>
<t>
The calculation of the VFY is performed as follows:
</t>
<t>
VFY = murmur(iAM | AddrPub | siAM, TCP-ISN)
</t>
<t>
<list style="symbols">
<t>iAM is included into the calculation as a 32 bit word.</t><t>siAM is included into the hash calculation as a single byte. (TBD: the 'selector' referenced below might be a more natural number to check against, instead of siAM ?).</t></list>

</t>
</section>
<section title='Encoding of the VFY into the packet: IP ID encoding'>
<t>
The low 16 bits of the VFY are encoded in network order into the IP ID of the packet after translation. the remaining 16 bits form the "VFYhi" value, which we attempt to fit into the TSval along with the other information.
</t>
</section>
<section title='Encoding of the VFY into the packet: TSval encoding'>
<t>
The TCP timestamp field encodes the iAM and VFYhi as follows:<figure><artwork>
 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|E E E E|S S S S| iAM MSB ... iAM LSB  | VFYhi MSB .. VFYhi LSB |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</t>
<t>
The range of siAM gives 16 possible ways to store iAM (along with the same number of degrees of assurance for the detection). In order to distinguish between those, we introduce the encoding selector (S) field, which will determine how the lower 24 bits are split between the iAM and the upper 16 bit of VFY. Note that the smallest value of siAM being 9, we will never be able to store the most significant bit of VFY.
</t>
<t>
The value of S is the number of zero-fill right-shift operations it would take on the low 24 bit in order to "normalize" the iAM - or, in other words, it is the number of bits of VFYhi stored within the timestamp.
</t>
<t>
Best practices in <xref target="I-D.ietf-tcpm-tcp-timestamps">I-D.ietf-tcpm-tcp-timestamps</xref>, mention that to reduce the TIME-WAIT state the timestamp value should be monotonously increasing across the connections with the same 5-tuple. To give the translators an opportunity to achieve this property, we reserve several most significant bits within the timestamp to signify the "Epoch" (E).This would require storing some additional state per 5-tuple, and the implementation of such a mechanism is outside of scope for this document. The implementations that do not implement the monotonously increasing timestamps, MUST keep the Epoch bits intact from the original value of the timestamp.
</t>
</section>
<section title='Operation of the mechanism'>
<t>
This section outlines the use of this mechanism by the translators and servers.
</t>

<section title='Translator Operation'>
<t>

</t>
<t>
The translator is involved into processing of the initial SYN segment (calculating the new version of the TCP timestamp and IP ID), as well as the SYN-ACK segments (restoring the original value of the TCP timestamp within the TSecr field).
</t>
</section>
<section title='Server Operation'>
<t>

</t>
<t>
The server would operate on every SYN that is of interest for the logging. It would extract the candidate iAM, and calculate the VFY value based on the public address and TCP ISN within the received SYN segment. Then it would compare the VFY against the corresponding bits in the TSval and IP ID fields. If there is a match, it means (with a reasonable probability) that the iAM was a valid one calculated by the translator inbetween.  This information is stored for later access by the application listening on that socket (e.g., stored in the TCB).
</t>
</section></section>
<section title='Interaction with TCP SYN cookies'>
<t>
TCP SYN cookies are commonly deployed to mitigate TCP SYN attacks <xref target="RFC4987">RFC4987</xref>. The mechanism described in this document requires the server store extra information which arrives on the TCP SYN, which increases the TCP server's attack surface.  To mitigate this, the translator should apply the similar algorithm to the timestamp of the ACK segment that is sent by the initiator of the connection in response to the server's SYN ACK. The authors considered that serverside might use the TSval in its SYN ACK segment, however this would interfere with the Extended syncookies. This section needs further discussion.
</t>
</section>
<section title='Other Mechanisms to Encode Client Identifier'>
<t>
This section outlines other mechanisms that we considered, and outlines the reasons we consider them not applicable.
</t>

<section title='Defining a new TCP option to store the address'>
<t>

</t>
<t>
This would be the cleanest and simplest approach, and is discussed in [ I-D.wing-reveal-address]. 
</t>
</section>
<section title='Using TSecr in TCP SYN'>
<t>

</t>
<t>
This value is set to zero, and is effectively unused - so it looks like a convenient place. However this violates the <xref target="RFC1323">RFC1323</xref>, and this would require much more thorough testing - and update to <xref target="RFC1323">RFC1323</xref>.
</t>
</section>
<section title='Reserving the different port ranges per client'>
<t>

</t>
<t>
This approach has an appeal due to its simplicity, but it would be specific to each NAPT device operated by each service provider.  That is, there is no way to identify the device or know the source port range assigned to an TCP client without contacting the administrator of the NAPT device.  Restricting clients to a specific range also exposes the clients to some security risk <xref target="I-D.ietf-tsvwg-port-randomization">I-D.ietf-tsvwg-port-randomization</xref>.
</t>
</section></section>
<section title='Security Considerations'>
<t>
The connections that happen, today, without aNAPT necessarily reveal the source address of the TCP client -- so revealing the identity of the client this should not be a concern except for the installations that attempt to use NAPT for "privacy" reasons. If such an installation exists, it is easy to see that any 1:1 remapping of e.g., IP ID would cause the failure of the validation algorithm - therefore "protecting the identity". 
</t>
<t>
Therefore, if an organization has more than one level of NAPT and wants to ensure that the internal translators do not disclose the information about the internal addresses, it can alter any of the elements used for the calculations - e.g. randomize the ISN, or remap the IP ID.
</t>
<t>
An attacker might might use this functionality to appear as if IP address sharing is occuring, in the hopes that a naive server will allow additional attack traffic. TCP servers and applications SHOULD NOT assume the mere presence of the functionality described in this paper indicates there are other  (benign) users sharing the same IP address.
</t>
<t>
The modification of the TSVal option value will break TCP-AO  <xref target="RFC5925">RFC5925</xref>, which provides integrity protection of the  TCP SYN (including TCP options).  However, TCP-AO is already known to not survive address sharing (through a NAPT or through an application  proxy).  
</t>
</section>
<section title='IANA considerations'>
<t>
None.
</t>
</section>
<section title='Acknowledgements'>
<t>
Thanks to Nicholas Leavy for the review.
</t>
<t>
</t></section>
</middle>
<back>
<references title="Normative References">
&rfc2119;
&rfc1323;
&rfc5925;
</references>
<references title="Informative References">
&rfc1948;
&rfc4987;
&I-D.ietf-intarea-shared-addressing-issues;
&I-D.ietf-tcpm-tcp-timestamps;
&I-D.ietf-tsvwg-port-randomization;
<reference anchor="URL.Murmur-hash" target="http://sites.google.com/site/murmurhash/">
  <front>
    <title>Murmur hash</title>
    <author/>
    <date/>
  </front>
</reference>
</references>

</back>
</rfc>


PAFTECH AB 2003-20262026-04-23 16:49:58