One document matched: draft-weaver-alto-edge-caches-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
There has to be one entity for each item to be referenced.
An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs),
please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
(Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space
(using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-weaver-alto-edge-caches-00" ipr="trust200902">
<!-- category values: std, bcp, info, exp, and historic
ipr values: full3667, noModification3667, noDerivatives3667
you can add the attributes updates="NNNN" and obsoletes="NNNN"
they will automatically be output with "(if approved)" -->
<!-- ***** FRONT MATTER ***** -->
<front>
<!-- The abbreviated title is used in the page header - it is only necessary if the
full title is longer than 39 characters -->
<title abbrev="P2P Localization and Edge Caches">Peer to Peer Localization Services and Edge Caches</title>
<!-- add 'role="editor"' below for the editors if appropriate -->
<!-- Another author who claims to be an editor -->
<author fullname="Nicholas Weaver" initials="N."
surname="Weaver">
<organization>International Computer Science Institute</organization>
<address>
<postal>
<street>1947 Center Street suite 600</street>
<!-- Reorder these if your country does things differently -->
<city>Berkeley</city>
<region>CA</region>
<code>94704</code>
<country>USA</country>
</postal>
<phone>+1 510 666 2903</phone>
<email>nweaver@icsi.berkeley.edu</email>
<!-- uri and facsimile elements may also be added -->
</address>
</author>
<date year="2009" />
<!-- Meta-data Declarations -->
<area>General</area>
<workgroup>Application Layer Traffic Optimization (ALTO) Working Group</workgroup>
<keyword>P2P, caches, localization</keyword>
<!-- Keywords will be incorporated into HTML output
files in a meta tag but they have no effect on text or nroff
output. If you submit your draft to the RFC Editor, the
keywords will be used for the search engine. -->
<abstract>
<t>
Without caches in the infrastructure, peer to peer content delivery's
primary effect is cost shifting rather than cost savings. Even with
perfect localization, depending on the relative cost of last-mile
uplink bandwidth verses transport bandwidth, P2P may substantially
increase aggregate cost. Yet the addition of edge caches, caches located in the ISPs near the customers, radically change the economics
of P2P content delivery. Edge caches interact very strongly with
localization services for P2P content delivery, and any localization
service must be tightly integrated into edge-cache operation.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>When compared with conventional content delivery, peer to
peer content delivery of bulk data is significant at shifting
costs from the content provider to the ISPs, but can often
significantly magnify the aggregate cost of delivery. Depending on the
particular costs to an ISP, even perfect localization
(restriction of P2P activity to within the ISP's network) may
still result in significantly higher aggregate costs over
conventional content delivery, although localization does reduce
transit costs.
</t>
<t>However, if edge-caches are introduced into the architecture,
the economics can change radically. Rather than increasing
transport costs, P2P with ISP-provided edge caches reduce
transport costs for all parties, achiving costs reductions for
the ISP analogous to those seen with edge-based HTTP servers
such as <xref target="akamai">Akamai</xref>. Yet unlike
edge-based web servers, edge-caches for P2P are
failure-transparent: when they fail, or do not have the right
data, the failure does not impact correct operation of the P2P
system.
</t>
<t>It is critical that ALTO or other localization services for
bulk-data P2P be both edge-cache aware and assist edge-caches in
their operation, for localization without edge-caches may not
produce significant cost savings to the ISPs or performance
benefits to the customers, but edge-caches need localization
services both to ease client discovery and to provide necessary
topological information for edge-cache operation.
</t>
<t>This document begins with a brief discussion
of <xref target="caches">edge caches for P2P</xref>, then
outlines a <xref target="economics">simple cost model of content
delivery</xref>, which argues why both localization and
edge-caches are necessary for cost-effective content delivery.
It then discusses <xref target="reliance">how localization and
edge-caches should interact</xref>, before a
brief <xref target="conclusions">conclusions section</xref>
</t>
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in <xref target="RFC2119">RFC 2119</xref>.</t>
</section>
<section anchor="caches" title="The Design of Edge Caches">
<t>An edge-cache is simply a special P2P node which lives in the
ISP's network close to, but not at, the final recipients. Thus
it incurs no transit cost in communicating with ISP-local peers,
and is close in latency and has a high-bandwidth connection into
the ISP's internal network.</t>
<t>The role of an edge cache is to coordinate transfers between
local peers and the rest of the Internet, as well as to cache
data for subsequent use, within the existing or modified P2P
protocol. For example, a BitTorrent edge cache can participate
in a swarm, offering up data only to ISP-local peers once it has
a complete file, and refusing to seed or leech (but only
tit-for-tat) with peers outside the ISP before it has obtained
the entire file.
</t>
<t>One feature of an edge-cache is that it can be unreliable.
Since, from the point of view of the other peers, it is simply
another P2P participant, if the edge-cache fails to include a
block, a file, or fails altogether, the P2P system will still
work properly. This is in sharp contrast to edge-based HTTP
caches or CDNs, where a failure in the node may result in
failures to the user.</t>
<t>A side consequence of unreliability is that an edge-cache can
therefore be inexpensive. For example, a 1U server (based on a
Mini-ITX motherboard) capable of holding 4 SATA disks might cost
less than $800. With a price of $130 for a 1.5TB drive, an edge
cache costing less than $1400 could cache over 5 TB of data.
Such a low-cost system might suffer significantly higher
transient failure rates than a higher-quality server,
necessitating a reboot, reimage, and disabling of bad disks, but
as failures are low-consequence, such caches can be cheap to
deploy.</t>
<t>Finally, a P2P edge-cache doesn't require changing existing
P2P protocols. As long as local peers will find the edge cache,
or the edge-cache can find the local peers, edge-caches can be
introduced into existing protocols without change. In
particular, BitTorrent is highly amenable to edge-caches without
requiring client changes.</t>
<section anchor="incentives" title="Safe Incentives for Edge Caches">
<t>The biggest impediment to building edge-caches is not
technical but legal. Given a P2P swarm, a single edge cache
or collection of caches should be able to monitor the swarm
and find participants. But an edge cache needs to be notified
both about a particular P2P swarm and that it is acceptable to
cache the swarm.</t>
<t>It is outside the scope of this document for a detailed
discussion, but there exist many possibilities, such as P2P
content providers (such as Linux ISO images) registering their
content, users of the ISP asserting that a swarm is legitimate
(and consenting to be identified if a copyright holder
objects), and agreements with third party data providers (such
as Amazon S3) which support BitTorrent and other P2P content
distribution.</t>
</section>
</section>
<section anchor="economics" title="An Economic Model for Delivery Costs">
<t>For purposes of this discussion, we assume that different
portions of the network have different costs to transmit or
receive one unit of data. Although costs really vary by time of
day and network conditions (for example, the cost to an ISP of
traffic on an uncongested uplink on the last mile is effectively
0, but can be huge if there is congestion, or peering
arrangements may make the cost of uplink transit negative), for
simplicity we will ignore these effects for now.</t>
<t>CP: This is the cost for the content provider to send one
unit of data</t>
<t>CDN: This is the cost for the content provider to send one
unit of data through a third party, edge-based CDN</t>
<t>CT: This is the cost for the ISP to receive one unit of data
from the general internet</t>
<t>CTU: This is the cost for the ISP to send one unit of data to
the general internet</t>
<t>CL: This is the cost for the ISP to send one unit of data to
the end customer across the last mile</t>
<t>CLU: This is the cost for the ISP to receive one unit of data
from an end customer across the last mile.</t>
<t>With such a basic cost model, it becomes possible to estimate
the costs for for different content delivery mechanisms.</t>
<t>Central (conventional) HTTP traffic: For such traffic, the
content provider pays N*CP, while the ISP pays N*(CT+CL). The
costs increases linearly with the number of requests.</t>
<t>Edge-located HTTP content delivery networks (such as Akamai):
For such traffic, the content provider pays N*CDN, while the ISP
pays N*CL. This is obviously the best case for the ISP, but the
cost of the CDN may not be favorable to the content
provider.</t>
<t>Conventional P2P without localization: If we assume the P2P
system is highly efficient, the content provider pays only CP
regardless of the number of users. The ISP will need to pay
N*(CL + CLU) for all users on the last mile, and some value less
than N*(CT + CTU) for transit.</t>
<t>Conventional P2P with perfect localization: If the P2P system
is perfect, including localizing the traffic completely within
the ISP, the content provider pays only CP, while the ISP will
need to pay N*(CL + CLU) but only (CT + CTU) for transit.</t>
<t>Conventional P2P with perfect localization and perfect edge
caches: Adding in edge-caches changes the situation. Now the
content provider pays only CP, while the ISP pays N*CL + CT +
CTU.</t>
<section anchor="limits" title="The Limits of Localization">
<t>Such a simple cost model illustrates the major limitation
of localization. If CLU, the cost of the last mile uplink, is
more than CT, the cost of the transit downlink, P2P can
significantly increase the costs to the ISP over conventional
HTTP delivery, even with perfect localization and perfect
operation. For some networks, such as DOCSIS cable modems,
this is often the case, as increasing network capacity on the
shared last mile may require new infrastructure or repurposing
bandwidth otherwise used for higher-value services such as
television channels.</t>
<t>Yet it shows that if edge-caches are added into the system,
everybody sees a cost savings: both the content provider and
the ISP benefit from lower cost, but without the reliability
concerns present in edge-based HTTP CDNs. Thus edge-caches
represent the best of both worlds: for a content provider,
edge-caches in the P2P system have the same low cost as a
conventional P2P system, but for the ISP, the edge-caches have
the same low cost as an edge-located CDN.</t>
</section>
</section>
<section anchor="reliance" title="Edge-Cache Interactions with Localization">
<t>Since edge-caches are critical to realize the true potential
of P2P to create an aggregate cost savings, they need to be
considered when developing other portions of a common P2P
infrastructure. In particular, edge-caches both interact with
and benefit from localization services, and thus it is critical
that both localization and edge-caching be codesigned to
interoperate. Thus some edge-cache concerns which directly
relate to localization.</t>
<t>Edge-cache discovery: Any localization service which supports
the discovery of "preferable" nodes should give preference to
any relevant edge-caches in the system. Thus the localization
service will drive traffic towards the relevant edge caches,
resulting in greater performance and lower cost-of-delivery.</t>
<t>Edge-cache content notification: Any localization service
should also act as content notification, notifying the
edge-cache about a user's desire to fetch a particular piece of
content. The edge-cache may use this information, along with
other constraints and heuristics, to determine whether it should
participate in this distribution system. For example, a
particular ISP's edge-cache for BitTorrent could be configured
to cache torrents requested from Amazon S3 or other sources
based on a contractual relationship, but reject torrents hosted
elsewhere.</t>
<t>Peer-access control: The edge-cache, when contacted by a
peer, needs to know whether the peer is local to its network.
Thus the localization service should support queries from the
edge cache as to whether a peer would be considered local to the
ISP.
</t>
<t>Support for file descriptors: In order for both the
localization service and the edge-cache to track files as they
are requested, ALTO requests from peers should include both a
per-file unique ID and a variable length field containing the
protocol's representation of the file requested (eg, for
BitTorrent, the .torrent file). This has some minor privacy
implications, but greatly enhances both the ability of
localization to know which peers are involved in a particular
transfer and the ability of edge-caches to determine which data
to fetch.
</t>
</section>
<section anchor="conclusions" title="Conclusions">
<t>Edge-caches are critical if P2P is to achieve the promised
aggregate cost savings. Without an edge-cache, localization's
benefits are limited, as even perfect localization is unable to
reduce the transfers over the last-mile uplink. Yet edge-caches
also need to rely on localization, both to drive traffic to the
edge cache, to discover new content, and to determine which peers
are allowed to access the edge-cache. Thus localization protocols
should include edge-caches in their focus, and edge-caches will
need to use localization protocols.</t>
</section>
<section anchor="Acknowledgements" title="Acknowledgements">
<t>Grant info here. All opinions are those of the author, not the funding institution.</t>
<t>Feedback on the general concept and economic models for P2P
edge caches from Richard Woundy, Jason Livingood, Vern Paxson,
Christian Kreibich, and others.</t>
</section>
<!-- Possibly a 'Contributors' section ... -->
<section anchor="IANA" title="IANA Considerations">
<t>None</t>
</section>
<section anchor="Security" title="Security Considerations">
<t>The privacy concerns of edge-caches and localization are only
mild to moderate. It is already possible for P2P nodes to
observe what other nodes are downloading or making available,
and an edge-cache simply represents another such node in the
system. Any P2P system which wishes to avoid this problem will
not want to use localization (because of the impacts on traffic
analysis), and ISPs will not want to cache such data (because
most of the data will represent illegal content).
</t>
<t>This is also why localization services such as ALTO should
have a query interface that doesn't just give a list of IP
addressees to rank, but also has query modes which present ALTO
with a UUID and a content identifier, so a localization
system can keep track of other systems which have already
requested the same content.</t>
</section>
</middle>
<!-- *****BACK MATTER ***** -->
<back>
<!-- References split into informative and normative -->
<!-- There are 2 ways to insert reference entries from the citation libraries:
1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
(for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")
Both are cited textually in the same manner: by using xref elements.
If you use the PI option, xml2rfc will, by default, try to find included files in the same
directory as the including file. You can also define the XML_LIBRARY environment variable
with a value containing a set of directories to search. These can be either in the local
filing system or remote ones accessed by http (http://domain/dir/... ).-->
<references title="Normative References">
<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->
&RFC2119;
</references>
<references title="Informative References">
<!-- Here we use entities that we defined at the beginning. -->
<!-- A reference written by by an organization not a person. -->
<reference anchor="akamai" target="http://www.akamai.com">
<!-- the following is the minimum to make xml2rfc happy -->
<front>
<title>The Akamai CDN</title>
<author>
<organization>Akamai Inc</organization>
</author>
<date year="2008" />
</front>
</reference>
</references>
<!-- Change Log
v00 2006-03-15 EBD Initial version
v01 2006-04-03 EBD Moved PI location back to position 1 -
v3.1 of XMLmind is better with them at this location.
v02 2007-03-07 AH removed extraneous nested_list attribute,
other minor corrections
v03 2007-03-09 EBD Added comments on null IANA sections and fixed heading capitalization.
Modified comments around figure to reflect non-implementation of
figure indent control. Put in reference using anchor="DOMINATION".
Fixed up the date specification comments to reflect current truth.
v04 2007-03-09 AH Major changes: shortened discussion of PIs,
added discussion of rfc include.
v05 2007-03-10 EBD Added preamble to C program example to tell about ABNF and alternative
images. Removed meta-characters from comments (causes problems). -->
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 20:06:41 |