One document matched: draft-weaver-alto-edge-caches-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
     An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-weaver-alto-edge-caches-00" ipr="trust200902">
  <!-- category values: std, bcp, info, exp, and historic
     ipr values: full3667, noModification3667, noDerivatives3667
     you can add the attributes updates="NNNN" and obsoletes="NNNN" 
     they will automatically be output with "(if approved)" -->

  <!-- ***** FRONT MATTER ***** -->

  <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
         full title is longer than 39 characters -->

    <title abbrev="P2P Localization and Edge Caches">Peer to Peer Localization Services and Edge Caches</title>

    <!-- add 'role="editor"' below for the editors if appropriate -->

    <!-- Another author who claims to be an editor -->

    <author fullname="Nicholas Weaver" initials="N." 
            surname="Weaver">
      <organization>International Computer Science Institute</organization>

      <address>
        <postal>
          <street>1947 Center Street suite 600</street>

          <!-- Reorder these if your country does things differently -->

          <city>Berkeley</city>

          <region>CA</region>

          <code>94704</code>

          <country>USA</country>
        </postal>

        <phone>+1 510 666 2903</phone>

        <email>nweaver@icsi.berkeley.edu</email>

        <!-- uri and facsimile elements may also be added -->
      </address>
    </author>

    <date year="2009" />

    <!-- Meta-data Declarations -->

    <area>General</area>

    <workgroup>Application Layer Traffic Optimization (ALTO) Working Group</workgroup>

    <keyword>P2P, caches, localization</keyword>

    <!-- Keywords will be incorporated into HTML output
         files in a meta tag but they have no effect on text or nroff
         output. If you submit your draft to the RFC Editor, the
         keywords will be used for the search engine. -->

    <abstract>
<t>
Without caches in the infrastructure, peer to peer content delivery's
primary effect is cost shifting rather than cost savings.  Even with
perfect localization, depending on the relative cost of last-mile
uplink bandwidth verses transport bandwidth, P2P may substantially
increase aggregate cost.  Yet the addition of edge caches, caches located in the ISPs near the customers, radically change the economics
of P2P content delivery.  Edge caches interact very strongly with
localization services for P2P content delivery, and any localization
service must be tightly integrated into edge-cache operation.
</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>When compared with conventional content delivery, peer to
      peer content delivery of bulk data is significant at shifting
      costs from the content provider to the ISPs, but can often
      significantly magnify the aggregate cost of delivery.  Depending on the
      particular costs to an ISP, even perfect localization
      (restriction of P2P activity to within the ISP's network) may
      still result in significantly higher aggregate costs over
      conventional content delivery, although localization does reduce
      transit costs.
	</t>

      <t>However, if edge-caches are introduced into the architecture,
      the economics can change radically.  Rather than increasing
      transport costs, P2P with ISP-provided edge caches reduce
      transport costs for all parties, achiving costs reductions for
      the ISP analogous to those seen with edge-based HTTP servers
      such as <xref target="akamai">Akamai</xref>.  Yet unlike
      edge-based web servers, edge-caches for P2P are
      failure-transparent: when they fail, or do not have the right
      data, the failure does not impact correct operation of the P2P
      system.
	</t>

      <t>It is critical that ALTO or other localization services for
      bulk-data P2P be both edge-cache aware and assist edge-caches in
      their operation, for localization without edge-caches may not
      produce significant cost savings to the ISPs or performance
      benefits to the customers, but edge-caches need localization
      services both to ease client discovery and to provide necessary
      topological information for edge-cache operation.
	</t>

      <t>This document begins with a brief discussion
      of <xref target="caches">edge caches for P2P</xref>, then
      outlines a <xref target="economics">simple cost model of content
      delivery</xref>, which argues why both localization and
      edge-caches are necessary for cost-effective content delivery.
      It then discusses <xref target="reliance">how localization and
      edge-caches should interact</xref>, before a
      brief <xref target="conclusions">conclusions section</xref>
      </t>

        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
        "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described
        in <xref target="RFC2119">RFC 2119</xref>.</t>
    </section>

    <section anchor="caches" title="The Design of Edge Caches">
      <t>An edge-cache is simply a special P2P node which lives in the
      ISP's network close to, but not at, the final recipients.  Thus
      it incurs no transit cost in communicating with ISP-local peers,
      and is close in latency and has a high-bandwidth connection into
      the ISP's internal network.</t>
      
      <t>The role of an edge cache is to coordinate transfers between
      local peers and the rest of the Internet, as well as to cache
      data for subsequent use, within the existing or modified P2P
      protocol.  For example, a BitTorrent edge cache can participate
      in a swarm, offering up data only to ISP-local peers once it has
      a complete file, and refusing to seed or leech (but only
      tit-for-tat) with peers outside the ISP before it has obtained
      the entire file.
      </t>

      <t>One feature of an edge-cache is that it can be unreliable.
      Since, from the point of view of the other peers, it is simply
      another P2P participant, if the edge-cache fails to include a
      block, a file, or fails altogether, the P2P system will still
      work properly.  This is in sharp contrast to edge-based HTTP
      caches or CDNs, where a failure in the node may result in
      failures to the user.</t>

      <t>A side consequence of unreliability is that an edge-cache can
      therefore be inexpensive.  For example, a 1U server (based on a
      Mini-ITX motherboard) capable of holding 4 SATA disks might cost
      less than $800.  With a price of $130 for a 1.5TB drive, an edge
      cache costing less than $1400 could cache over 5 TB of data.
      Such a low-cost system might suffer significantly higher
      transient failure rates than a higher-quality server,
      necessitating a reboot, reimage, and disabling of bad disks, but
      as failures are low-consequence, such caches can be cheap to
      deploy.</t>

      <t>Finally, a P2P edge-cache doesn't require changing existing
      P2P protocols.  As long as local peers will find the edge cache,
      or the edge-cache can find the local peers, edge-caches can be
      introduced into existing protocols without change.  In
      particular, BitTorrent is highly amenable to edge-caches without
      requiring client changes.</t>

      <section anchor="incentives" title="Safe Incentives for Edge Caches">
	<t>The biggest impediment to building edge-caches is not
	technical but legal.  Given a P2P swarm, a single edge cache
	or collection of caches should be able to monitor the swarm
	and find participants.  But an edge cache needs to be notified
	both about a particular P2P swarm and that it is acceptable to
	cache the swarm.</t>
	<t>It is outside the scope of this document for a detailed
	discussion, but there exist many possibilities, such as P2P
	content providers (such as Linux ISO images) registering their
	content, users of the ISP asserting that a swarm is legitimate
	(and consenting to be identified if a copyright holder
	objects), and agreements with third party data providers (such
	as Amazon S3) which support BitTorrent and other P2P content
	distribution.</t>
	</section>
      </section>

    <section anchor="economics" title="An Economic Model for Delivery Costs">
      <t>For purposes of this discussion, we assume that different
      portions of the network have different costs to transmit or
      receive one unit of data.  Although costs really vary by time of
      day and network conditions (for example, the cost to an ISP of
      traffic on an uncongested uplink on the last mile is effectively
      0, but can be huge if there is congestion, or peering
      arrangements may make the cost of uplink transit negative), for
      simplicity we will ignore these effects for now.</t>

      <t>CP: This is the cost for the content provider to send one
      unit of data</t>

      <t>CDN: This is the cost for the content provider to send one
      unit of data through a third party, edge-based CDN</t>

      <t>CT: This is the cost for the ISP to receive one unit of data
      from the general internet</t>

      <t>CTU: This is the cost for the ISP to send one unit of data to
      the general internet</t>

      <t>CL: This is the cost for the ISP to send one unit of data to
      the end customer across the last mile</t>

      <t>CLU: This is the cost for the ISP to receive one unit of data
      from an end customer across the last mile.</t>

      <t>With such a basic cost model, it becomes possible to estimate
      the costs for for different content delivery mechanisms.</t>

      <t>Central (conventional) HTTP traffic: For such traffic, the
      content provider pays N*CP, while the ISP pays N*(CT+CL).  The
      costs increases linearly with the number of requests.</t>

      <t>Edge-located HTTP content delivery networks (such as Akamai):
      For such traffic, the content provider pays N*CDN, while the ISP
      pays N*CL.  This is obviously the best case for the ISP, but the
      cost of the CDN may not be favorable to the content
      provider.</t>

      <t>Conventional P2P without localization: If we assume the P2P
      system is highly efficient, the content provider pays only CP
      regardless of the number of users.  The ISP will need to pay
      N*(CL + CLU) for all users on the last mile, and some value less
      than N*(CT + CTU) for transit.</t>

      <t>Conventional P2P with perfect localization: If the P2P system
      is perfect, including localizing the traffic completely within
      the ISP, the content provider pays only CP, while the ISP will
      need to pay N*(CL + CLU) but only (CT + CTU) for transit.</t>

      <t>Conventional P2P with perfect localization and perfect edge
      caches: Adding in edge-caches changes the situation.  Now the
      content provider pays only CP, while the ISP pays N*CL + CT +
      CTU.</t>

      <section anchor="limits" title="The Limits of Localization">
	<t>Such a simple cost model illustrates the major limitation
	of localization.  If CLU, the cost of the last mile uplink, is
	more than CT, the cost of the transit downlink, P2P can
	significantly increase the costs to the ISP over conventional
	HTTP delivery, even with perfect localization and perfect
	operation.  For some networks, such as DOCSIS cable modems,
	this is often the case, as increasing network capacity on the
	shared last mile may require new infrastructure or repurposing
	bandwidth otherwise used for higher-value services such as
	television channels.</t>
	<t>Yet it shows that if edge-caches are added into the system,
	everybody sees a cost savings: both the content provider and
	the ISP benefit from lower cost, but without the reliability
	concerns present in edge-based HTTP CDNs.  Thus edge-caches
	represent the best of both worlds: for a content provider,
	edge-caches in the P2P system have the same low cost as a
	conventional P2P system, but for the ISP, the edge-caches have
	the same low cost as an edge-located CDN.</t>
	</section>
      </section>

    <section anchor="reliance" title="Edge-Cache Interactions with Localization">
      <t>Since edge-caches are critical to realize the true potential
      of P2P to create an aggregate cost savings, they need to be
      considered when developing other portions of a common P2P
      infrastructure.  In particular, edge-caches both interact with
      and benefit from localization services, and thus it is critical
      that both localization and edge-caching be codesigned to
      interoperate.  Thus some edge-cache concerns which directly
      relate to localization.</t>
      
      <t>Edge-cache discovery: Any localization service which supports
      the discovery of "preferable" nodes should give preference to
      any relevant edge-caches in the system.  Thus the localization
      service will drive traffic towards the relevant edge caches,
      resulting in greater performance and lower cost-of-delivery.</t>

      <t>Edge-cache content notification: Any localization service
      should also act as content notification, notifying the
      edge-cache about a user's desire to fetch a particular piece of
      content.  The edge-cache may use this information, along with
      other constraints and heuristics, to determine whether it should
      participate in this distribution system.  For example, a
      particular ISP's edge-cache for BitTorrent could be configured
      to cache torrents requested from Amazon S3 or other sources
      based on a contractual relationship, but reject torrents hosted
      elsewhere.</t>
      
      <t>Peer-access control: The edge-cache, when contacted by a
      peer, needs to know whether the peer is local to its network.
      Thus the localization service should support queries from the
      edge cache as to whether a peer would be considered local to the
      ISP.
	</t>

      <t>Support for file descriptors: In order for both the
      localization service and the edge-cache to track files as they
      are requested, ALTO requests from peers should include both a
      per-file unique ID and a variable length field containing the
      protocol's representation of the file requested (eg, for
      BitTorrent, the .torrent file).  This has some minor privacy
      implications, but greatly enhances both the ability of
      localization to know which peers are involved in a particular
      transfer and the ability of edge-caches to determine which data
      to fetch.
	</t>

      </section>

    <section anchor="conclusions" title="Conclusions">
    <t>Edge-caches are critical if P2P is to achieve the promised
    aggregate cost savings.  Without an edge-cache, localization's
    benefits are limited, as even perfect localization is unable to
    reduce the transfers over the last-mile uplink.  Yet edge-caches
    also need to rely on localization, both to drive traffic to the
    edge cache, to discover new content, and to determine which peers
    are allowed to access the edge-cache.  Thus localization protocols
    should include edge-caches in their focus, and edge-caches will
    need to use localization protocols.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>Grant info here.  All opinions are those of the author, not the funding institution.</t>

      <t>Feedback on the general concept and economic models for P2P
      edge caches from Richard Woundy, Jason Livingood, Vern Paxson,
      Christian Kreibich, and others.</t>
    </section>

    <!-- Possibly a 'Contributors' section ... -->

    <section anchor="IANA" title="IANA Considerations">
      <t>None</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>The privacy concerns of edge-caches and localization are only
      mild to moderate.  It is already possible for P2P nodes to
      observe what other nodes are downloading or making available,
      and an edge-cache simply represents another such node in the
      system.  Any P2P system which wishes to avoid this problem will
      not want to use localization (because of the impacts on traffic
      analysis), and ISPs will not want to cache such data (because
      most of the data will represent illegal content).
	</t>

      <t>This is also why localization services such as ALTO should
      have a query interface that doesn't just give a list of IP
      addressees to rank, but also has query modes which present ALTO
      with a UUID and a content identifier, so a localization
      system can keep track of other systems which have already
      requested the same content.</t>

    </section>
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>
    <!-- References split into informative and normative -->

    <!-- There are 2 ways to insert reference entries from the citation libraries:
     1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
     2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
        (for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")

     Both are cited textually in the same manner: by using xref elements.
     If you use the PI option, xml2rfc will, by default, try to find included files in the same
     directory as the including file. You can also define the XML_LIBRARY environment variable
     with a value containing a set of directories to search.  These can be either in the local
     filing system or remote ones accessed by http (http://domain/dir/... ).-->

    <references title="Normative References">
      <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->
      &RFC2119;

    </references>

    <references title="Informative References">
      <!-- Here we use entities that we defined at the beginning. -->



      <!-- A reference written by by an organization not a person. -->

      <reference anchor="akamai" target="http://www.akamai.com">
        <!-- the following is the minimum to make xml2rfc happy -->
        <front>
          <title>The Akamai CDN</title>
          <author>
            <organization>Akamai Inc</organization>
          </author>
          <date year="2008" />
        </front>
      </reference>


    </references>


    <!-- Change Log

v00 2006-03-15  EBD   Initial version

v01 2006-04-03  EBD   Moved PI location back to position 1 -
                      v3.1 of XMLmind is better with them at this location.
v02 2007-03-07  AH    removed extraneous nested_list attribute,
                      other minor corrections
v03 2007-03-09  EBD   Added comments on null IANA sections and fixed heading capitalization.
                      Modified comments around figure to reflect non-implementation of
                      figure indent control.  Put in reference using anchor="DOMINATION".
                      Fixed up the date specification comments to reflect current truth.
v04 2007-03-09 AH     Major changes: shortened discussion of PIs,
                      added discussion of rfc include.
v05 2007-03-10 EBD    Added preamble to C program example to tell about ABNF and alternative 
                      images. Removed meta-characters from comments (causes problems).  -->
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 20:06:41