http://stupid.domain.name/ietf/

One document matched: draft-ietf-ppsp-peer-protocol-05.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
     An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3552 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3552.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std" docName="draft-ietf-ppsp-peer-protocol-05" ipr="trust200902">
  <!-- category values: std, bcp, info, exp, and historic
     ipr values: full3667, noModification3667, noDerivatives3667
     you can add the attributes updates="NNNN" and obsoletes="NNNN" 
     they will automatically be output with "(if approved)" -->

<!-- ***** FRONT MATTER ***** -->

<front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
         full title is longer than 39 characters -->

<title abbrev="PPSP Peer Protocol">Peer-to-Peer Streaming Peer Protocol (PPSPP)</title>

    <!-- add 'role="editor"' below for the editors if appropriate -->

    <!-- Another author who claims to be an editor -->

<author fullname="Arno Bakker" initials="A." surname="Bakker">
  <organization>Technische Universiteit Delft</organization>

   <address>
     <postal>
       <street>Mekelweg 4</street>

       <!-- Reorder these if your country does things differently -->

       <code>2628CD</code>

       <city>Delft</city>

       <region></region>

       <country>The Netherlands</country>
     </postal>

     <phone></phone>

     <email>arno@cs.vu.nl</email>

     <!-- uri and facsimile elements may also be added -->
   </address>
</author>

<author fullname="Riccardo Petrocco" initials="R." surname="Petrocco">
  <organization>Technische Universiteit Delft</organization>

   <address>
     <postal>
       <street>Mekelweg 4</street>

       <!-- Reorder these if your country does things differently -->

       <code>2628CD</code>

       <city>Delft</city>

       <region></region>

       <country>The Netherlands</country>
     </postal>

     <phone></phone>

     <email>r.petrocco@gmail.com</email>

     <!-- uri and facsimile elements may also be added -->
   </address>
</author>

<author fullname="Victor Grishchenko" initials="V." surname="Grishchenko">
  <organization>Technische Universiteit Delft</organization>

   <address>
     <postal>
       <street>Mekelweg 4</street>

       <!-- Reorder these if your country does things differently -->

       <code>2628CD</code>

       <city>Delft</city>

       <region></region>

       <country>The Netherlands</country>
     </postal>

     <phone></phone>

     <email>victor.grishchenko@gmail.com</email>

     <!-- uri and facsimile elements may also be added -->
   </address>
</author>


<date month="January" year="2013" />

    <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
         in the current day for you. If only the current year is specified, xml2rfc will fill 
	 in the current day and month for you. If the year is not the current one, it is 
	 necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
	 purpose of calculating the expiry date).  With drafts it is normally sufficient to 
	 specify just the year. -->

    <!-- Meta-data Declarations -->

<area>Transport</area>

<workgroup>PPSP</workgroup>

    <!-- WG name at the upperleft corner of the doc,
         IETF is fine for individual submissions.  
	 If this element is not present, the default is "Network Working Group",
         which is used by the RFC Editor as a nod to the history of the IETF. -->

<keyword></keyword>

    <!-- Keywords will be incorporated into HTML output
         files in a meta tag but they have no effect on text or nroff
         output. If you submit your draft to the RFC Editor, the
         keywords will be used for the search engine. -->

<abstract><t>
The Peer-to-Peer Streaming Peer Protocol (PPSPP) is a transport protocol for 
disseminating the same content to a group of interested parties in a streaming 
fashion. PPSPP supports streaming of both pre-recorded (on-demand) and live 
audio/video content. It is based on the peer-to-peer paradigm, where clients 
consuming the content are put on equal footing with the servers initially 
providing the content, to create a system where everyone can potentially 
provide upload bandwidth. It has been designed to provide short time-till-playback 
for the end user, and to prevent disruption of the streams by malicious peers. 
PPSPP has also been designed to be flexible and extensible. It can use different 
mechanisms to optimize peer uploading, prevent freeriding, and work with 
different peer discovery schemes (centralized trackers or Distributed Hash 
Tables). It supports multiple methods for content integrity protection and 
chunk addressing. Designed as a generic protocol that can run on top of various
transport protocols, it currently runs on top of UDP using LEDBAT for
congestion control. </t>
</abstract>

</front>
<middle>


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Introduction" anchor="sec_intro">


<!-- %% -->
<section title="Purpose" anchor="sec_intro_purp">

<t>
This document describes the Peer-to-Peer Streaming Peer Protocol (PPSPP), 
designed for disseminating the same content to a group of interested parties in
a streaming fashion. PPSPP supports streaming of both pre-recorded (on-demand) 
and live audio/video content. It is based on the peer-to-peer paradigm where 
clients consuming the content are put on equal footing with the servers 
initially providing the content, to create a system where everyone can 
potentially provide upload bandwidth.
</t>  

<t>
PPSPP has been designed to provide short time-till-playback for the end user,
and to prevent disruption of the streams by malicious peers. Central in this
design is a simple method of identifying content based on self-certification.
In particular, content in PPSPP is identified by a single cryptographic hash
that is the root hash in a Merkle hash tree calculated recursively from the
content <xref target="MERKLE"/><xref target="ABMRKL"/>.  This self-certifying
hash tree allows every peer to directly detect when a malicious peer tries to
distribute fake content. The tree can be used for both static and live content.
Moreover, it ensures only a small amount of information is needed to start a
download and to verify incoming chunks of content, thus ensuring short start-up
times.
</t>  
<t>
PPSPP has also been designed to be extensible for different transports and 
use cases. Hence, PPSPP is a generic protocol which can run directly on top of 
UDP, TCP, or other protocols. As such, PPSPP defines a common set of messages 
that make up the protocol, which can have different representations on the wire 
depending on the lower-level protocol used. When the lower-level transport 
allows, PPSPP can also use different congestion control algorithms. 
</t>
<t>
At present, PPSPP is set to run on top of UDP using LEDBAT for congestion
control <xref target="RFC6817"/>. Using LEDBAT enables PPSPP
to serve the content after playback (seeding) without disrupting the user who
may have moved to different tasks that use its network connection. LEDBAT may be
replaced with a different algorithm when the work of the IETF working group on
RTP Media Congestion Avoidance Techniques (RMCAT) <xref target="RMCATCHART"/>
matures.
</t>  
<t>
PPSPP is also flexible and extensible in the mechanisms it uses to promote
client contribution and prevent freeriding, that is, how to deal with peers that
only download content but never upload to others. It also allows different
schemes for chunk addressing and content integrity protection, if the defaults are
not fit for a particular use case. In addition, it can work with different peer
discovery schemes, such as centralized trackers or fast Distributed Hash Tables
<xref target="JIM11"/>. Finally, in this default setup, PPSPP maintains only a 
small amount of state per peer. A reference implementation of PPSPP over UDP is 
available <xref target="SWIFTIMPL"/>.
</t>
</section>


<!-- %% -->
<section title="Requirements Language" anchor="sec_intro_req">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
      </section>

<!-- %% -->
<section title="Terminology" anchor="sec_intro_term">
<t>
<list style="hanging" hangIndent="4">

<t hangText="message"><vspace blankLines="0"/>
The basic unit of PPSPP communication. A message will have different 
representations on the wire depending on the transport protocol used. Messages 
are typically multiplexed into a datagram for transmission.
</t>

<t hangText="datagram"><vspace blankLines="0"/>
A sequence of messages that is offered as a unit to the underlying transport 
protocol (UDP, etc.). The datagram is PPSPP's Protocol Data Unit (PDU).
</t>

<t hangText="content"><vspace blankLines="0"/>
Either a live transmission, a pre-recorded multimedia asset, or a file.
</t>

<t hangText="chunk"><vspace blankLines="0"/>
The basic unit in which the content is divided. E.g. a block of N 
kilobyte.
</t>

<t hangText="chunk ID"><vspace blankLines="0"/>
Unique identifier for a chunk of content (e.g. an integer). Its type
depends on the chunk addressing scheme used.
</t>

<t hangText="chunk specification"><vspace blankLines="0"/>
An expression that denotes one or more chunk IDs.
</t>

<t hangText="chunk addressing scheme"><vspace blankLines="0"/>
Scheme for identifying chunks and expressing the chunk availability map
of a peer in a compact fashion. 
</t>

<t hangText="chunk availability map"><vspace blankLines="0"/>
The set of chunks a peer has successfully downloaded and checked the integrity 
of.
</t>

<t hangText="bin"><vspace blankLines="0"/>
A number denoting a specific binary interval of the content (i.e., one 
or more consecutive chunks) in the bin numbers chunk addressing scheme 
(see <xref target="sec_chunkaddr"/>).
</t>

<t hangText="content integrity protection scheme"><vspace blankLines="0"/>
Scheme for protecting the integrity of the content while it is being distributed 
via the peer-to-peer network. I.e. methods for receiving peers to detect 
whether a requested chunk has been maliciously modified by the sending peer.
</t>

<t hangText="hash"><vspace blankLines="0"/>
The result of applying a cryptographic hash function, more specifically 
a modification detection code (MDC) <xref target="HAC01" />, such as SHA-1 
<xref target="FIPS180-3" />, to a piece of data. 
</t>

<t hangText="Merkle hash tree"><vspace blankLines="0"/>
A tree of hashes whose base is formed by the hashes of the chunks of content,
and its higher nodes are calculated by recursively computing the hash of 
the concatenation of the two child hashes 
(see <xref target="sec_intprot_merkle"/>). 
</t>

<t hangText="root hash"><vspace blankLines="0"/>
The root in a Merkle hash tree calculated recursively from the content 
(see <xref target="sec_intprot_merkle"/>).
</t>

<t hangText="swarm"><vspace blankLines="0"/>
A group of peers participating in the distribution of the same content.
</t>

<t hangText="swarm ID"><vspace blankLines="0"/>
Unique identifier for a swarm of peers, in PPSPP a sequence of bytes. When 
Merkle hash trees are used for content integrity protection, the identifier 
is the so-called root hash of the content (video-on-demand). For live 
streaming, the swarm ID is a public key.
</t>

<t hangText="tracker"><vspace blankLines="0"/>
An entity that records the addresses of peers participating in a swarm,
usually for a set of swarms, and makes this membership information 
available to other peers on request.
</t>

<t hangText="choking"><vspace blankLines="0"/>
When a peer A is choking peer B it means that A is currently not 
willing to accept requests for content from B.
</t>

<t hangText="seeding/leeching"><vspace blankLines="0"/>
When a peer A is seeding it means that A has downloaded a static
content asset completely and is now offering it for others to download.
If peer A does not yet have all content or is not offering it for download,  
A is said to be leeching.
</t>

</list>
</t>
</section>

</section>




<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Overall Operation" anchor="sec_over">


<t>
The basic unit of communication in PPSPP is the message. Multiple messages are
multiplexed into a single datagram for transmission. A datagram (and hence the 
messages it contains) will have different representations on the wire depending 
on the transport protocol used (see <xref target="sec_encap_udp"/>).
</t>

<t>
The overall operation of PPSPP is illustrated in the following examples. The 
examples assume that UDP is used for transport, the recommended method for 
content integrity protection (Merkle hash trees) is used, and that a specific 
policy is used for selecting which chunks to download.
</t>

<!-- %% -->
<section title="Joining a Swarm" anchor="sec_over_join">
<t>
Consider a peer A that wants to download a certain content asset.
To commence a PPSPP download, peer A must have the swarm ID of the content 
and a list of one or more tracker contact points (e.g. host+port). The list 
of trackers is optional in the presence of a decentralized tracking mechanism. 
</t>

<t>
Peer A now registers with the tracker following e.g. the PPSP tracker
protocol <xref target="I-D.ietf-ppsp-reqs"/> and receives the IP address 
and port of peers already in the swarm, say B, C, and D. Peer A now sends a 
datagram containing a HANDSHAKE message to B, C, and D. This message conveys 
protocol options, in particular, peer A includes the ID of the swarm as 
the destination peers can listen for multiple swarms on the same network 
address.
</t>

<t>
Peer B and C respond with datagrams containing a HANDSHAKE message and one or 
more HAVE messages. A HAVE message conveys (part of) the chunk availability of 
a peer and thus contains a chunk specification that denotes what chunks of the 
content peer B, resp. C have. Peer D sends a datagram with a HANDSHAKE and 
HAVE messages, but also with a CHOKE message. The latter indicates that D
is not willing to upload chunks to A at present.
</t>
</section>

<!-- %% -->
<section title="Exchanging Chunks" anchor="sec_over_exch">
<t>
In response to B and C, A sends new datagrams to B and C containing REQUEST 
messages. A REQUEST message indicates the chunks that a peer wants to
download, and thus contains a chunk specification. The REQUEST messages to B and C 
refer to disjunct sets of chunks.  B and C respond with datagrams containing
HAVE, DATA and, in this example, INTEGRITY messages. In the Merkle hash tree 
content protection scheme (see <xref target="sec_intprot_merkle"/>), the 
INTEGRITY messages contain all cryptographic hashes that peer A needs to verify 
the integrity of the content chunk sent in the DATA message. Using these hashes 
peer A verifies that the chunks received  from B and C are correct. It also 
updates the chunk  availability of B and C  using the information in the 
received HAVE messages. 
</t>

<t>
After processing, A sends a datagram containing HAVE messages for the chunks
it just received to all its peers. In the datagram to B and C it includes an
ACK message acknowledging the receipt of the chunks, and adds REQUEST messages
for new chunks. ACK messages are not used when a reliable transport protocol
is used. When e.g. C finds that A obtained a chunk (from B) that C did not
yet have, C's next datagram includes a REQUEST for that chunk.
</t>

<t>
Peer D also sends HAVE messages to A when it downloads chunks from other peers.
When D is willing to accept REQUESTs from A, D sends a datagram with
an UNCHOKE message to inform A. If B or C decide to choke A they sending a CHOKE
message and A should then re-request from other peers. B and C may continue to 
send HAVE, REQUEST, or periodic KEEPALIVE messages such that A keeps sending 
them HAVE messages.
</t>

<t>
Once peer A has received all content (video-on-demand use case) it stops
sending messages to all other peers that have all content (a.k.a. seeders).
Peer A MAY also contact the tracker or another source again to obtain more 
peer addresses.
</t>
</section>

<!-- %% -->
<section title="Leaving a Swarm" anchor="sec_over_leave">
<t>
To leave a swarm in a graceful way, peer A sends a "close-channel" datagram 
to all its peers and deregisters from the tracker following the (PPSP) 
tracker protocol. Peers receiving the datagram should remove A
from their current peer list. If A crashes ungracefully, peers should
remove A from their peer list when they detect it no longer sends messages.
</t>
</section>

</section>


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Messages" anchor="sec_msgs">
<t>
In general, no error codes or responses are used in the protocol; absence 
of any response indicates an error. Invalid messages are discarded, and
further communication with the peer SHOULD be stopped. The rationale is that
it is sufficient to classify peers as either good (i.e., responding with chunks)
or bad and only use the good ones. This behavior allows a peer to deal with 
slow, crashed and (silent) malicious peers.
</t>

<t>
For the sake of simplicity, one swarm of peers deals with one content 
asset (e.g. file) only. Retrieval of a collections of files can be done 
either by using multiple swarms or by using an external storage mapping from 
the linear byte space of a single swarm to different files, transparent to the 
protocol, as described in <xref target="sec_msgs_storindep"/>.
</t>


<!-- %% -->
<section title="HANDSHAKE" anchor="sec_msgs_HANDSHAKE">
<t>
The initiating peer and the addressed peer MUST send a HANDSHAKE message as the
first message in the first datagrams they exchange. The payload of the HANDSHAKE
message is a sequence of protocol options. Example options are the content 
integrity protection scheme used and an option to specify the swarm identifier. 
The initiating peer's HANDSHAKE SHOULD contain the latter option as peers 
SHOULD listen for messages for multiple swarms on the same network address 
in the chosen UDP encapsulation. Protocol options are specified in 
 <xref target="sec_protopt"/>.
</t>

<t>
After the handshakes are exchanged, the initiator knows that the peer really
responds. Hence, the second datagram the initiator sends MAY already contain
some heavy payload, e.g. DATA messages. To minimize the number of initialization 
round-trips, the first two datagrams exchanged MAY also contain some minor 
payload, e.g. HAVE messages to indicate the current progress of a peer or a 
REQUEST (see <xref target="sec_msgs_REQUEST"/>).
</t>
</section>


<!-- %% -->
<section title="HAVE" anchor="sec_msgs_HAVE">
<t>
The HAVE message is used to convey which chunks a peer has available for 
download. The set of chunks it has available may be expressed using different 
chunk addressing and availability map compression schemes, described in 
<xref target="sec_chunkaddr"/>. HAVE messages can be used both for sending 
a complete overview of a peer's chunk  availability as well as for updates to 
that set.
</t>

<t>
In particular, whenever a receiving peer has successfully checked the integrity 
of a chunk or interval of chunks it MUST send a HAVE message to all peers it 
wants to interact with in the near future. The latter confinement allows the 
HAVE message to be used as a method of choking. The HAVE message MUST contain 
the chunk specification of the received chunks. A receiving peer MUST NOT
send a HAVE message to peers for which the handshake procedure is still 
incomplete, see <xref target="sec_sec_handshake"/>.
</t>
</section>


<!-- %% -->
<section title="DATA" anchor="sec_msgs_DATA">
<t>
The DATA message is used to transfer chunks of content. The DATA message MUST 
contain the chunk ID of the chunk and chunk itself. A peer MAY send the 
DATA messages for multiple chunks in the same datagram. The DATA message MAY
contain additional information if needed by the specific congestion control 
mechanism used. At present PPSPP uses LEDBAT <xref target="RFC6817"/>
for congestion control, which requires the current system time to be sent along 
with the DATA message, so the current system time MUST be included.
</t>
</section>


<!-- %% -->
<section title="ACK" anchor="sec_msgs_ACK">
<t>
When PPSPP is run over an unreliable transport protocol, an implementation MAY
choose to use ACK messages to acknowledge received data. When used, a receiving 
peer that has successfully checked the integrity of a chunk or interval of chunks
C it MUST send an ACK message containing a chunk specification for C. As LEDBAT 
is used, an ACK message MUST contain the one-way delay, computed from the peer's 
current system time received in the DATA message. A peer MAY delay sending
ACK messages as defined in the LEDBAT specification.
</t>
</section>


<!-- %% -->
<section title="INTEGRITY" anchor="sec_msgs_INTEGRITY">
<t>
The INTEGRITY message carries information required by the receiver to verify
the integrity of a chunk. Its payload depends on the content integrity protection
scheme used. When the recommended method of Merkle hash trees is used, 
the datagram carrying the DATA message MUST include the cryptographic hashes 
that are necessary for a receiver to check the integrity of the chunk in 
the form of INTEGRITY messages. What are the necessary hashes is explained in 
<xref target="sec_intprot_merkle_atomic"/>.
</t>
</section>


<!-- %% -->
<section title="SIGNED_INTEGRITY" anchor="sec_msgs_SIGNED_INTEGRITY">
<t>
The SIGNED_INTEGRITY message carries digitally signed information required by 
the receiver to verify the integrity of a chunk in live streaming. It logically
contains a chunk specification and a digital signature. Its exact payload 
depends on the live content integrity protection scheme used, see 
<xref target="sec_live_auth"/>.
</t>
</section>


<!-- %% -->
<section title="REQUEST" anchor="sec_msgs_REQUEST">
<t>
While bulk download protocols normally do explicit requests for certain ranges 
of data (i.e., use a pull model, for example, BitTorrent 
<xref target="BITTORRENT"/>), live streaming protocols quite often use a 
request-less push model to save round  trips. PPSPP supports both models of 
operation. 
</t>

<t>
A peer MAY send a REQUEST message that MUST contain the specification of the 
chunks it  wants to download. A peer receiving a REQUEST message MAY send out 
the requested chunks. When peer Q receives multiple REQUESTs from the same peer P  
peer Q SHOULD process the REQUESTs sequentially. Multiple REQUEST messages 
MAY be sent in one datagram, for example, when a peer wants to request 
several rare chunks at once. 
</t>

<t>
When live streaming via a push model, a peer receiving REQUESTs also MAY send 
some other chunks in case it runs out of requests or for some other reason. 
In that case the only purpose of REQUEST messages is to provide hints and 
coordinate peers to avoid unnecessary data retransmission. 
</t>
</section>


<!-- %% -->
<section title="CANCEL" anchor="sec_msgs_CANCEL">
<t>
When downloading on demand or live streaming content, a peer MAY request urgent 
data from multiple peers to increase the probability of it being delivered on time. 
In particular, when the specific chunk picking algorithm (see <xref target="sec_ext_cpa"/>), 
detects that a request for urgent data might not be served on time, a request 
for the same data MAY be sent to a different peer. 
When a peer P decides to request urgent data from a peer Q, peer P SHOULD send a 
CANCEL message to all the peers to which the data has been previously requested. 
The CANCEL message contains the specification of the chunks P no longer wants
to request. In addition, when peer Q receives a HAVE message for the urgent data 
from peer P, peer Q MUST also cancel the previous REQUEST(s) from P. In other
words, the HAVE message acts as an implicit CANCEL.
</t>
</section>


<!-- %% -->
<section title="CHOKE and UNCHOKE" anchor="sec_msgs_CHOKE">
<t>
Peer A MAY send a CHOKE message to peer B to signal it will no longer be 
responding to REQUEST messages from B, for example, because A's upload capacity 
is exhausted. Peer A MAY send a subsequent UNCHOKE message to signal that it 
will respond to new REQUESTs from B again (A SHOULD discard old requests).
When peer B receives a CHOKE message from A it MUST NOT send new REQUEST 
messages and SHOULD NOT expect answers to any outstanding ones. The CHOKE and
UNCHOKE messages are informational as a peer is not required to respond to 
REQUESTs, see <xref target="sec_msgs_REQUEST"/>.
</t>
</section>


<!-- %% -->
<section title="Peer Address Exchange and NAT Hole Punching" anchor="sec_msgs_PEX">

<!-- %%% -->
<section title="PEX_REQ and PEX_RES Messages" anchor="sec_msgs_PEX_msgs">
<t>
Peer address exchange messages (or PEX messages for short) are common in 
many peer-to-peer protocols. By exchanging peer addresses in gossip fashion, 
peers relieve central coordinating entities (the trackers) from unnecessary 
work. PPSPP optionally features two types of PEX messages: PEX_REQ and PEX_RES.
A peer that wants to retrieve some peer addresses MUST send a PEX_REQ message.
The receiving peer MAY respond with a PEX_RES message containing the (potentially 
signed) addresses of several peers. The addresses MUST be of peers it has 
exchanged messages with in the last 60 seconds to guarantee liveliness. 
Alternatively, the receiving peer MAY ignore PEX messages if uninterested 
in obtaining new peers or because of security considerations (rate limiting) 
or any other reason. The PEX messages can be used to construct a dedicated 
tracker peer.
</t>

<t>
As peer-address exchange enables a number of attacks it should not be used 
outside a benign environment unless extra security measures are in place.
These security measures, which involve exchanging addresses in cryptographically
signed swarm-membership certificates, are described in 
<xref target="sec_sec_pex"/>.
</t>
</section>

<!-- %%% -->
<section title="Hole Punching via PPSPP Messages" anchor="sec_msgs_PEX_nat">
<t>
PPSPP can be used in combination with STUN <xref target="RFC5389"/>. In addition, 
the native PEX messages can be used to do simple NAT hole punching 
<xref target="SNP" />, as follows. When peer A introduces peer B to peer 
C by sending a PEX_RES message to C, it SHOULD also send a PEX_RES message to B 
introducing C. These messages SHOULD be within 2 seconds from each other, but 
MAY not be simultaneous, instead leaving a gap of twice the "typical" RTT, i.e.
300-600 ms. As a result, the peers are supposed to initiate handshakes to each
other thus forming a simple NAT hole punching pattern where the
introducing peer effectively acts as a STUN server. Note that the PEX_RES 
message is sent without a prior PEX_REQ in this case. Also note the PEX_RES
from A to B is likely to arrive because recent communication between A and B
is a prerequisite for A introducing B to C, see previous section.
</t>
</section>
</section>


<!-- %% -->
<section title="Keep Alive Signalling" anchor="sec_msgs_keepalive">
<t>
A peer MUST send a "keep alive" message periodically to each peer it wants to 
interact with in the future, but has no other messages to send them at present. 
PPSPP does not define an explicit message type for "keep alive" messages. In the 
PPSP-over-UDP encapsulation they are implemented as simple datagrams consisting 
of a  4-byte channel number only, see <xref target="sec_encap_udp_channels"/>
and <xref target="sec_encap_udp_HANDSHAKE"/>. 
</t>
</section>


<!-- %% -->
<section title="Storage Independence" anchor="sec_msgs_storindep">
<t>
Note PPSPP does not prescribe how chunks are stored. This also allows users of 
PPSPP to map different files into a single swarm as in BitTorrent's multi-file 
torrents <xref target="BITTORRENT"/>, and more innovative storage solutions when 
variable-sized chunks are used.  
</t>
</section>
</section>




<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Chunk Addressing Schemes" anchor="sec_chunkaddr">

<t>
PPSPP can use different methods of chunk addressing, that is, support different 
ways of identifying chunks and different ways of expressing the chunk 
availability map of a peer in a compact fashion. 
</t>

<!-- %% -->
<section title="Start-End Ranges" anchor="sec_chunkaddr_range">
<t>
A chunk specification consists of a single (start specification,end 
specification) pair that identifies a range of chunks (end inclusive). 
The start and end specifications can use one of multiple addressing schemes. 
Two schemes are currently defined, chunk ranges and byte ranges.
</t>

<!-- %%% -->
<section title="Chunk Ranges" anchor="sec_chunkaddr_range_chunk">
<t>
The start and end specification are both chunk identifiers. A PPSPP peer 
MUST support this scheme. 
</t>
</section>


<!-- %%% -->
<section title="Byte Ranges" anchor="sec_chunkaddr_range_byte">
<t>
The start and end specification are byte offsets in the content. A PPSPP peer 
MAY support this scheme. 
</t>
</section>

</section>

<!-- %% -->
<section title="Bin Numbers" anchor="sec_chunkaddr_bin">
<t>
PPSPP introduces a novel method of addressing chunks of content called "bin
numbers" (or "bins" for short). Bin numbers allow the addressing of a binary 
interval of data using a single integer. This reduces the amount of state that 
needs to be recorded per peer and the space needed to denote intervals on the 
wire, making the protocol light-weight. In general, this numbering system allows 
PPSPP to work with simpler data structures, e.g. to use arrays instead of 
binary trees, thus reducing complexity. A PPSPP peer MAY support this scheme.
</t>  
<t>
In bin addressing, the smallest binary interval is a single chunk (e.g. a block
of bytes which may be of variable size), the largest interval is a complete 
range of 2**63 chunks. In a novel addition to the classical scheme, 
these intervals are numbered in a way which lays them out into a vector nicely,
which is called bin numbering, as follows. Consider an chunk interval of width 
W. To derive the bin numbers of the complete interval and the subintervals, a 
minimal balanced binary tree is built that is at least W chunks wide at the 
base. The leaves from left-to-right correspond to the chunks 0..W-1 in the 
interval, and have bin number I*2 where I is the index of the chunk (counting 
beyond W-1 to balance the tree). The bin number of higher level nodes P in the
tree is calculated as follows: 
</t>
<t>
<list style="hanging" hangIndent="4">
<t>
    binP = (binL + binR) / 2
</t>
</list>
</t>
<t>
where binL is the bin of node P's left-hand child and binR is the bin of node
P's right-hand child. Given that each node in the tree represents a
subinterval of the original interval, each such subinterval now is
addressable by a bin number, a single integer. The bin number tree of an
interval of width W=8 looks like this:
</t>

<figure align="center" anchor="fig_chunkaddr_bin">
<artwork align="center"><![CDATA[

              7
             / \
           /     \
         /         \
       /             \
      3                11
     / \              / \
    /   \            /   \
   /     \          /     \
  1       5        9       13 
 / \     / \      / \      / \
0   2   4   6    8   10  12   14

C0  C1  C2  C3   C4  C5  C6   C7
]]></artwork>
<postamble>The bin number tree of an interval of width W=8</postamble>
</figure>

<t>
So bin 7 represents the complete interval, bin 3 represents the interval of
chunk 0..3, bin 1 represents the interval of chunks 0 and 1, and 
bin 2 represents chunk C1. The special numbers 0xFFFFFFFF (32-bit) or 
0xFFFFFFFFFFFFFFFF (64-bit) stands for an empty interval, and 0x7FFF...FFF 
stands for "everything". 
</t>
<t>
When bin numbering is used, the ID of a chunk is its corresponding (leaf) bin 
number in the tree and the chunk specification in HAVE and ACK messages is equal 
to a single bin number, as follows.
</t>

</section>


<!-- %% -->
<section title="In Messages" anchor="sec_chunkaddr_msgs">

<!-- %%% -->
<section title="In HAVE Messages" anchor="sec_chunkaddr_have">
<t>
When a receiving peer has successfully checked the integrity of a chunk or
interval of chunks it MUST send a HAVE message to all peers it wants to 
interact with. The latter allows the HAVE message to be used as a method of 
choking. The HAVE message MUST contain the chunk specification of the biggest
complete interval of all chunks the receiver has received and checked so far
that fully includes the interval of chunks just received. So the chunk 
specification MUST denote at least the interval received, but the receiver is 
supposed to aggregate and acknowledge bigger intervals, when possible.
</t>
<t>
As a result, every single chunk is acknowledged a logarithmic number of times.
That provides some necessary redundancy of acknowledgments and sufficiently 
compensates for unreliable transport protocols.
</t>
<t>
Implementation note:
<list style="hanging" hangIndent="4">
<t>
To record which chunks a peer has in the state that an implementation keeps
for each peer, an implementation MAY use the efficient "binmap" data structure, 
which is a hybrid of a bitmap and a binary tree, discussed in detail in 
<xref target="BINMAP"/>.
</t>
</list>
</t>

</section>

<!-- %%% -->
<section title="In ACK Messages" anchor="sec_chunkaddr_ack">
<t>
When PPSPP is run over an unreliable transport protocol, an implementation MAY
choose to use ACK messages to acknowledge received data. When a receiving 
peer has successfully checked the integrity of a chunk or interval of chunks
C it MUST send a ACK message containing the chunk specification of its biggest,
complete, interval covering C to the sending peer (see HAVE).
</t> 
</section>

</section>

<!-- %% -->
<section title="Compatibility" anchor="sec_chunkaddr_compat">
<t>
In principle, peers using range addressing and peers using bin numbering 
can interact, with some limitations. Alternatively, a peer A MAY refuse
to interact with a peer B using a different addressing scheme. In that
case, A MUST respond to B'S HANDSHAKE message by sending an explicit close
(see <xref target="sec_encap_udp_HANDSHAKE"/>). PPSPP presently supports only 
interaction between willing peers when fixed sized chunks are used, as follows:
</t>
<t>
When a bin peer sends a message containing a chunk specification to a byte-range 
peer it MUST translate its internal bin numbers to byte ranges. When a byte 
range peer sends a message with a chunk specification message to a bin peer, 
it MUST round its internal byte ranges to 1 or more bins. For the latter 
translation, the byte-range peer MUST know the fixed chunk size used (which it 
should receive along with the swarm identifier). When a range translates to
multiple bins, the byte-range peer should send multiple 
e.g. HAVE messages. Note that the bin peer may not be able to request all 
content the byte-range peer has if it does not have an integral number of 
chunks.  
</t><t>
Aside: Translation from bytes to bins is possible for variable sized chunks 
only when the byte-range peer has extra information. In particular, it will
need to know the individual sizes of the chunks from the start of the content
till the byte range it wants to convey to the bin peer. 
</t>    
<t>
A similar translation MUST be done for translating between bins and chunk ranges.
Chunk ranges are directly translatable to bins. Assuming ranges are intervals 
of a list of chunks numbered 0...N, for a given bin number "bin" and bitwise
operations AND and OR:
<list style="hanging" hangIndent="4">
<t> startrange = (bin AND (bin + 1))/2  </t>
<t> endrange = ((bin OR (bin + 1)) - 1)/2  </t>
</list>
</t>
<t>
The reverse translation may require a chunk range to be rounded to the largest
binary interval it covers, or for a range be translated to a series of bin 
numbers that should be sent using multiple (e.g. HAVE) messages.
</t>
<t>Finally, byte-range peers can interact with chunk-range peers, by
using the direct translation from chunks into bytes and by rounding byte ranges
into chunk ranges. The latter requires the byte-range peer to know the fixed
chunk size. 
</t>
</section>

</section>


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Content Integrity Protection" anchor="sec_intprot">

<t>
PPSPP can use different methods for protecting the integrity of the content 
while it is being distributed via the peer-to-peer network. More specifically, 
PPSPP can use different methods for receiving peers to detect whether a 
requested chunk has been maliciously modified by the sending peer.
</t>
<t>  
This section describes the recommended method for bad content detection, 
the Merkle Hash Tree scheme, which SHOULD be implemented for 
protecting static content. It can also be efficiently used in protecting 
live streams, as explained below and in <xref target="sec_live_auth"/>.
</t>
<t>
The Merkle hash tree scheme can use different chunk addressing schemes. All it
requires is the ability to address a range of chunks. In the following 
description abstract node IDs are used to identify nodes in the tree. On the 
wire these are translated to the corresponding range of chunks in the chosen
chunk addressing scheme. When bin numbering is used, node IDs correspond 
directly to bin numbers in the INTEGRITY message, see below.
</t>

<!-- %% -->
<section title="Merkle Hash Tree Scheme" anchor="sec_intprot_merkle">
<t>
PPSPP uses a method of naming content based on self-certification. In particular, 
content in PPSPP is identified by a single cryptographic hash that is the 
root hash in a Merkle hash tree calculated recursively from the content 
<xref target="ABMRKL"/>. This self-certifying hash tree allows every peer to 
directly detect when a malicious peer tries to distribute fake content. It also 
ensures only a small the amount of information is needed to start a download 
(the root hash and some peer addresses). For live streaming a dynamic tree 
and a public key are used, see below.
</t>

<t>
The Merkle hash tree of a content asset that is divided into N chunks
is constructed as follows. Note the construction does not assume chunks
of content to be fixed size. Given a cryptographic hash function, more 
specifically  a modification detection code (MDC) 
<xref target="HAC01"/> , such as SHA1, the hashes of all the 
chunks of the content are calculated. Next, a binary tree of sufficient height 
is created. Sufficient height means that the lowest level in the tree has 
enough nodes to hold all chunk hashes in the set, as with bin numbering. 
The figure below shows the tree for a content asset consisting of 7 chunks. 
As before with the content addressing scheme, the leaves of the tree correspond 
to a chunk and in this case are assigned the hash of that chunk, starting at 
the left-most leaf. As the base of the tree may be wider than the number of 
chunks, any remaining leaves in the tree are assigned an empty hash value of 
all zeros. Finally, the hash values of the higher levels in the tree are 
calculated, by concatenating the hash values of the two children (again left 
to right) and computing the hash of that aggregate. If the two children are
empty hashes, the parent is an empty all zeros hash as well 
(to save computation). This process ends in a hash value for the root node, 
which is called the "root hash". Note the root hash only depends on the content 
and any modification of the content will result in a different root hash. 
</t>

<figure align="center" anchor="fig_intprot_merkle">
<artwork align="center"><![CDATA[

               7 = root hash
              / \
            /     \
          /         \
        /             \
      3*               11
     / \              / \
    /   \            /   \
   /     \          /     \
  1       5        9       13* = uncle hash
 / \     / \      / \      / \
0   2   4   6    8   10* 12   14
        
C0  C1  C2  C3   C4  C5  C6   E
=chunk index     ^^           = empty hash

]]></artwork>
<postamble>The Merkle hash tree of an interval of width W=8</postamble>
</figure>

</section>
                           

<!-- %% -->
<section title="Content Integrity Verification" anchor="sec_intprot_merkle_verify">
<t>
Assuming a peer receives the root hash of the content it wants to download
from a trusted source, it can check the integrity of any chunk of that 
content it receives as follows. It first calculates the hash of the chunk 
it received, for example chunk C4 in the previous figure. Along with this 
chunk it MUST receive the hashes required to check the integrity of that 
chunk. In principle, these are the hash of the chunk's sibling (C5) and 
that of its "uncles". A chunk's uncles are the sibling Y of its parent X, 
and the uncle of that Y, recursively until the root is reached. For chunk C4 
its uncles are nodes 13 and 3, marked with * in the figure. Using this information 
the peer recalculates the root hash of the tree, and compares it to the 
root hash it received from the trusted source. If they match the chunk of 
content has been positively verified to be the requested part of the content. 
Otherwise, the sending peer either sent the wrong content or the wrong 
sibling or uncle hashes. For simplicity, the set of sibling and uncles 
hashes is collectively referred to as the "uncle hashes". 
</t>

<t>
In the case of live streaming the tree of chunks grows dynamically and the root 
hash is undefined or, more precisely, transient, as long as new data is 
generated by the live source. <xref target="sec_live_merkle"/> defines
a method for content integrity verification for live streams that works
with such a dynamic tree. Although the tree is dynamic, content verification 
works the same for both live and predefined content, resulting in a unified
method for both types of streaming. 
</t>
</section>

<!-- %% -->
<section title="The Atomic Datagram Principle" anchor="sec_intprot_merkle_atomic">
<t>
As explained above, a datagram consists of a sequence of messages. Ideally,
every datagram sent must be independent of other datagrams, so each 
datagram SHOULD be processed separately and a loss of one datagram MUST NOT
disrupt the flow.  Thus, as a datagram carries zero or more messages,
neither messages nor message interdependencies SHOULD span over multiple
datagrams. 
</t>

<t>
This principle implies that as any chunk is verified using its uncle 
hashes the necessary hashes SHOULD be put into the same datagram as the 
chunk's data. If this is not possible because of a limitation on datagram
size, the necessary hashes MUST be sent first in one or more datagrams. 
As a general rule, if some additional data is still missing to process a 
message within a datagram, the message SHOULD be dropped. 
</t>

<t>
The hashes necessary to verify a chunk are in principle its sibling's hash
and all its uncle hashes, but the set of hashes to send can be optimized. 
Before sending a packet of data to the receiver, the sender inspects the 
receiver's previous acknowledgments (HAVE or ACK) to derive which hashes the
receiver already has for sure. Suppose, the receiver had acknowledged chunks C0 
and C1 (first two chunks of the file), then it must already have uncle hashes 5,
11 and so on. That is because those hashes are necessary to check C0 and C1
against the root hash. Then, hashes 3, 7 and so on must be also known as they 
are calculated in the process of checking the uncle hash chain. Hence, to send 
chunk C7, the sender needs to include just the hashes for nodes 14 and 9, which 
let the data be checked against hash 11 which is already known to the receiver. 
</t>

<t>
The sender MAY optimistically skip hashes which were sent out in previous,
still unacknowledged datagrams. It is an optimization trade-off between
redundant hash transmission and possibility of collateral data loss in the
case some necessary hashes were lost in the network so some delivered data
cannot be verified and thus has to be dropped. In either case, the receiver
builds the Merkle tree on-demand, incrementally, starting from the root
hash, and uses it for data validation.
</t>

<t>
In short, the sender MUST put into the datagram the missing hashes necessary 
for the receiver to verify the chunk. The receiver MUST remember all the
hashes it needs to verify missing chunks that it still wants to download. 
Note that the latter implies that a hardware-limited receiver MAY forget 
some hashes if it does not plan to announce possession of these chunks to others 
(i.e., does not plan to send HAVE messages.)
</t>
</section>


<!-- %% -->
<section title="INTEGRITY Messages" anchor="sec_intprot_merkle_msg">
<t>
Concretely, a peer that wants to send a chunk of content creates a datagram 
that MUST consist of a list of INTEGRITY messages followed by a DATA message.
 If the INTEGRITY messages and DATA message cannot be put into a single datagram
because of a limitation on datagram size, the INTEGRITY messages MUST be sent 
first in one or more datagrams. The list of INTEGRITY messages sent MUST contain 
a INTEGRITY message for each hash the receiver misses for integrity checking. 
A INTEGRITY message for a hash MUST contain the chunk specification 
corresponding to the node ID of the hash and the hash data itself. The chunk 
specification corresponding to a node ID is defined as the range of 
chunks formed by the leaves of the subtree rooted at the node. For example,
node 3 in <xref target="fig_intprot_merkle"/> denotes chunks 0,2,4,6, 
so the chunk specification should denote that interval. The list of INTEGRITY 
messages MUST be sorted in order of tree height of the nodes, descending. 
The DATA message MUST contain the chunk specification of the chunk and chunk 
itself. A peer MAY send the required messages for multiple chunks in the same 
datagram, depending on the encapsulation. 
</t>
</section>


<!-- %% -->
<section title="Discussion and Overhead" anchor="sec_intprot_merkle_overhead">

<t>
The current method for protecting content integrity in BitTorrent 
<xref target="BITTORRENT"/> is not suited for streaming. It involves providing 
clients with the hashes of the content's chunks before the download commences 
by means of metadata files (called .torrent files in BitTorrent.) However, 
when chunks are small as in the current UDP encapsulation of PPSPP this implies 
having to download a large number of hashes before content download can begin. 
This, in turn, increases time-till-playback for end users, making 
this method unsuited for streaming.
</t>
   
<t>
The overhead of using Merkle hash trees is limited. The size of the hash tree
expressed as the total number of nodes depends on the number of chunks the 
content is divided (and hence the size of chunks) following this formula:
<list style="hanging" hangIndent="4">
<t>
	nnodes = math.pow(2,math.log(nchunks,2)+1)
</t>
</list>
In principle, the hash values of all these nodes will have to be sent to a peer
once for it to verify all chunks. Hence the maximum on-the-wire overhead is
hashsize * nnodes. However, the actual number of hashes transmitted can be 
optimized as described in <xref target="sec_intprot_merkle_atomic"/>. 
To see a peer can verify all chunks whilst receiving not all hashes, consider 
the example tree in <xref target="sec_intprot_merkle"/>. 
</t>

<t>
In case of a simple progressive download, of chunks 0,2,4,6, etc. the sending
peer will send the following hashes:
</t>

<texttable anchor="tab_merkle_overhead" title="Overhead for the example tree">
<preamble></preamble>
<ttcol align="center">Chunk</ttcol>
<ttcol align="left">Node IDs of hashes sent</ttcol>
<c>0</c>		<c>2,5,11</c>
<c>2</c>		<c>- (receiver already knows all)</c>
<c>4</c>		<c>6</c>
<c>6</c>		<c>-</c>
<c>8</c>		<c>10,13 (hash 3 can be calculated from 0,2,5)</c>
<c>10</c>		<c>-</c>
<c>12</c>		<c>14</c>
<c>14</c>		<c>-</c>
<c>Total</c>	<c># hashes	7</c>
<postamble></postamble>
</texttable>

<t>
So the number of hashes sent in total (7) is less than the total number 
of hashes in the tree (16), as a peer does not need to send hashes that are 
calculated and verified as part of earlier chunks.
</t>

</section>



<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Automatic Detection of Content Size" anchor="sec_intprot_autosize">
<t>
In PPSPP, the root hash of a static content asset, such as a video file,
along with some peer addresses is sufficient to start a download.
In addition, PPSPP can reliably and automatically derive the size
of such content from information received from the network when fixed
sized chunks are used. As a result, it is not necessary to include 
the size of the content asset as the metadata of the content,
in addition to the root hash. Implementations of PPSPP MAY use 
this automatic detection feature. Note this feature is the only feature of 
PPSPP that requires that a fixed-sized chunk is used.   
</t>

<!-- %% -->
<section title="Peak Hashes" anchor="sec_intprot_autosize_peak">
<t>
The ability for a newcomer peer to detect the size of the content 
depends heavily on the concept of peak hashes. Peak hashes,
in general, enable two cornerstone features of PPSPP: reliable file 
size detection and download/live streaming unification (see  
<xref target="sec_live"/>). 
The concept of peak hashes depends on the concepts of filled and
incomplete nodes. Recall that when constructing the binary trees
for content verification and addressing the base of the tree may
have more leaves than the number of chunks in the content. In the
Merkle hash tree these leaves were assigned empty all-zero 
hashes to be able to calculate the higher level hashes. A filled
node is now defined as a node that corresponds to an interval
of leaves that consists only of hashes of content chunks, not
empty hashes. Reversely, an incomplete (not filled) node 
corresponds to an interval that contains also empty hashes,
typically an interval that extends past the end of the file.
In the following figure nodes 7, 11, 13 and 14 are incomplete
the rest is filled.
</t>

<t>
Formally, a peak hash is the hash of a filled node in the Merkle tree, 
whose sibling is an incomplete node. Practically, suppose a file is 7162 bytes 
long and a chunk is 1 kilobyte. That file fits into 7 chunks, the tail chunk 
being 1018 bytes long. The Merkle tree for that file is shown in 
<xref target="fig_autosize_peak"/>. Following the definition the peak hashes of 
this file are in nodes 3, 9 and 12, denoted with a *. E denotes an empty hash.
</t>

<figure align="center" anchor="fig_autosize_peak">
<artwork align="center"><![CDATA[

               7
              / \
            /     \
          /         \
        /             \
      3*               11
     / \              / \
    /   \            /   \
   /     \          /     \
  1       5        9*      13 
 / \     / \      / \      / \
0   2   4   6    8   10  12*  14

C0  C1  C2  C3   C4  C5  C6   E
                         = 1018 bytes
]]></artwork>
<postamble>Peak hashes in a Merkle hash tree.</postamble>
</figure>

<t>
Peak hashes can be explained by the binary representation of the 
number of chunks the file occupies. The binary representation for 
7 is 111. Every "1" in binary representation of the file's packet 
length corresponds to a peak hash. For this particular file there
are indeed three peaks, nodes 3, 9, 12. The number of peak
hashes for a file is therefore also at most logarithmic with its 
size.
</t>

<t>
A peer knowing which nodes contain the peak hashes for the file 
can therefore calculate the number of chunks it consists of, and 
thus get an estimate of the file size (given all chunks but the last 
are fixed size). Which nodes are the peaks can  be securely communicated 
from one (untrusted) peer A to another B by letting A send the 
peak hashes and their node IDs to B. It can be shown that 
the root hash that B obtained from a trusted source is sufficient 
to verify that these are indeed the right peak hashes, as follows.
</t>  
<t>
Lemma: Peak hashes can be checked against the root hash.
</t>
<t>
Proof: (a) Any peak hash is always the left sibling. Otherwise, be
it the right sibling, its left neighbor/sibling must also be
a filled node, because of the way chunks are laid
out in the leaves, contradiction. (b) For the rightmost 
peak hash, its right sibling is zero. (c) For any peak hash, 
its right sibling might be calculated using peak hashes to the 
left and zeros for empty nodes. (d) Once the right sibling of 
the leftmost peak hash is calculated, its parent might be 
calculated. (e) Once that parent is calculated, we might 
trivially get to the root hash by concatenating the hash with 
zeros and hashing it repeatedly.
</t>

<t>
Informally, the Lemma might be expressed as follows: peak hashes cover all
data, so the remaining hashes are either trivial (zeros) or might be
calculated from peak hashes and zero hashes.
</t>

<t>
Finally, once peer B has obtained the number of chunks in the content it
can determine the exact file size as follows. Given that all chunks
except the last are fixed size B just needs to know the size of the last
chunk. Knowing the number of chunks B can calculate the node ID of the 
last chunk and download it. As always B verifies the integrity of this 
chunk against the trusted root hash. As there is only one chunk of data 
that leads to a successful verification the size of this chunk must 
be correct. B can then determine the exact file size as
</t> 
<t>
<list style="hanging" hangIndent="4">
<t>
    (number of chunks -1) * fixed chunk size + size of last chunk
</t>
</list>
</t>
</section>

<!-- %% -->
<section title="Procedure" anchor="sec_intprot_autosize_proc">
<t>
A PPSPP implementation that wants to use automatic size detection MUST
operate as follows. When a peer B sends a DATA message for the first time
to a peer A, B MUST first send all the peak hashes for the content, unless A has 
already signalled earlier in the exchange that it knows the peak hashes by 
having acknowledged any chunk. If they are needed, the peak hashes MUST be sent 
as an extra list of uncle hashes for the chunk, before the list of actual 
uncle hashes of the chunk as described in <xref target="sec_intprot_merkle_atomic"/>. 
The receiver A MUST check the peak hashes against the root hash to determine 
the approximate content size. To obtain the definite content size peer A MUST 
download the last chunk of the content from any peer that offers it.
</t>
</section>
</section>

</section>





<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Live Streaming" anchor="sec_live">
<t>
The set of messages defined above can be used for live streaming as well.
In a pull-based model, a live streaming injector can announce the chunks it 
generates via HAVE messages, and peers can retrieve them via REQUEST messages.
Areas that need special attention are content authentication and chunk 
addressing (to achieve an infinite stream of chunks). 
</t>

<section title="Content Authentication" anchor="sec_live_auth">
<t>
For live streaming, PPSPP supports two methods for a peer to authenticate
the content it receives from another peer, called "Sign All" and 
"Unified Merkle Tree". 
</t>
<t>
In the "Sign All" method, the live injector signs each chunk of content
using a private key and peers, upon receiving the chunk, check the signature using 
the corresponding public key obtained from a trusted source. In particular,
in PPSP, the swarm ID of the live stream is that public key. The signature is
sent along with the DATA message containing the relevant chunk using the 
SIGNED_INTEGRITY message. 
</t>
<t>
In the "Unified Merkle Tree" method, PPSPP combines the Merkle hash tree scheme
for static content with signatures to unify the video-on-demand and live
streaming case. The use of Merkle hash trees reduces the number of 
signing and verification operations per second, that is, provide signature 
amortization similar to the approach described in <xref target="SIGMCAST"/>.
</t>
<t>
In both methods the swarm ID consists of a public key encoded as in a DNSSEC
DNSKEY resource record without BASE-64 encoding <xref target="RFC4034"/>. 
In particular, the swarm ID consists of a 1 byte Algorithm field that 
identifies the public key's cryptographic algorithm and determines the format 
of the Public Key field that follows. 
</t>

<section title="Sign All" anchor="sec_live_signall">
<t>
Even with "Sign All", the number of cryptographic operations per second may be
limited. For example, consider a 25 frame/second video transmitted over UDP. 
When each frame is transmitted in its own chunk, only 25 signature verification 
operations per second are required, for the receiving peer, for bitrates up 
to ~12.8 megabit/second over UDP. For higher bitrates multiple UDP packets per 
frame are needed. 
</t>
</section>


<section title="Unified Merkle Tree" anchor="sec_live_merkle">
<t>
In this method, the chunks of content are used as the basis for a Merkle hash 
tree as for static content. However, because chunks are continuously generated, 
this tree is not static, but dynamic. As a result, the tree does not have
a root hash, or more precisely has a transient root hash. A public key
therefore serves as swarm ID of the content. It is used to digitally sign
updates to the tree, allowing peers to expand it based on trusted information
using the following procedure. 
</t>
<t>
The live injector creates a number of chunks N, a fixed power of 2 (N>=2), 
which are added as new leaves to the existing hash tree, expanding the tree as 
required. As a result of this expansion, the tree will have gotten a set of new 
peak hashes (see <xref target="sec_intprot_autosize_peak"/>). The injector now signs 
only the peak hashes in this set that are not in the old set of peak hashes.
For N being a power of 2 there will just be one new peak hash (see below). This 
complementary signed peak is distributed to the peers. Receiving
peers will verify the signature on the signed peak against the swarm ID, 
update their tree and request the new chunks.
</t>
<t>
To illustrate this procedure, consider the injector has generated the tree
shown in <xref target="fig_live_peakr2"/> and it is connected
to several peers that currently have the same tree and all chunks. 
In this tree the root node 3 is also the peak node for this tree. Now the 
injector generates N=2 new chunks. As a result the tree expands as shown in 
<xref target="fig_live_peakr3"/>. The two new pieces 8 and 10 extend the 
tree on the right side, and to accommodate them a new root is created, node 7.
As this tree is wider at the base than the actual number of chunks, there
are currently two empty leaves. The peak nodes in this tree are 3 and 9. 
</t>

<figure align="center" anchor="fig_live_peakr2">
<artwork align="center"><![CDATA[

      3*
     / \ 
    /   \ 
   /     \ 
  1       5  
 / \     / \  
0   2   4   6 

]]></artwork>
<postamble>Current live tree </postamble>
</figure>



<figure align="center" anchor="fig_live_peakr3">
<artwork align="center"><![CDATA[

               7
              / \
            /     \
          /         \
        /             \
      3*               11
     / \              / \
    /   \            /   \
   /     \          /     \
  1       5        9*      13 
 / \     / \      / \      / \
0   2   4   6    8   10   E   E

]]></artwork>
<postamble>Next current live tree</postamble>
</figure>

<t>
The injector now needs to inform its peers of the changed tree, in particular
the addition of the new complementary peak hash 9. To this extent,
it sends an INTEGRITY message with the hash of node 9, a SIGNED_INTEGRITY message
with the signature of the hash of node 9 and a HAVE message for node 9.
The receiving peers now expand their view of the tree. Next, the peers
will request e.g. chunk 8 from the injector by sending a REQUEST message.
The injector responds by sending the requester an INTEGRITY message with the 
hash of node 10, and a DATA message with chunk 8. This allows the peer to 
verify the chunk against peak hash 9 which is signed by the trusted injector.  
</t>
<t>
The injector MAY send HAVE messages for the chunks it creates immediately,
and allow peers to retrieve them. This optimizes the use of the injector's
bandwidth. Peers MUST NOT forward these chunks to others until they have 
received and checked the peak hash signature and the necessary hashes.
</t>
<t>
This procedure generates just 1 new peak hash for every N blocks, so it 
requires just one signature on each iteration, making it N times cheaper
than "Sign All". To see why just 1 new peak hash is generated each iteration
let's return to the definition of a peak hash in a tree, from 
<xref target="sec_intprot_autosize_peak"/>. A peak hash is the hash of a
filled node in the Merkle tree, whose sibling is an incomplete node. Now consider 
the above procedure where N chunks (with N a power of 2) are added to
a tree at each iteration. In the first iteration, the tree consists of just N 
leaves, therefore the only peak is the root of the tree. In the second iteration,
the tree consists of 2N chunks and the only peak is the root of that bigger tree 
(depicted in <xref target="fig_live_peakr2"/> for N=2). In the third iteration, 
we have 3N chunks as leaves and a tree that is 4N wide (to span the 3N 
chunks) and hence has N empty leaves (depicted in <xref target="fig_live_peakr3"/> 
for N=2). This implies that the tree has 2 peaks, notably the peak from
the previous iteration (node 3 in the figure) and the top of the subtree
of the N chunks that were added last (node 9 in the figure). Although this
iteration has two peaks, there is only one new peak as the expanded tree 
overlaps with the tree from the previous iteration. In the fourth iteration, 
we have a complete balanced tree again, and just a single new peak. It is now
easy to see that this process in which previous peaks are either consumed into 
a single new peak, or peak sets overlap with just 1 new addition yields a 
single new peak per N chunks.
</t>   
<t>
From this we can conclude that the injector has to sign less hashes than in 
the "Sign All" method. A receiving peer therefore also has to verify less 
signatures. It does additionally need to check one or more hashes per chunk 
via the Merkle Tree scheme, but this requires less CPU than a signature 
verification for every chunk. This approach is similar to signature 
amortization via Merkle Tree Chaining <xref target="SIGMCAST"/>.
The downside of this amortization of signature costs over several chunks is 
that latency will increase. A receiving peer now has to wait for the signature 
before delivering the chunks to the higher layers responsible for playback 
<xref target="POLLIVE"/>, unless some (optimistic) optimisations 
are made. It MUST check the signature before forwarding the chunks to other 
peers.
</t>
<t>
The number of chunks per signature N MUST be a fixed power of 2 (N>=2).
The procedure does not preclude using variable-sized chunks. Using a variable
number N, however, is not allowed as this breaks the property of generating
just 1 new peak per iteration.
</t>
<t>
Unification of static content checking and live content checking is achieved 
by sending the signed peak hashes on-demand, ahead of the actual data. As 
before, the sender SHOULD use acknowledgments to derive which content range 
the receiver has peak hashes for, and to prepend the data hashes with the 
necessary (signed) peak hashes. Except for the fact that the set of peak 
hashes changes with time, other parts of the protocol work as described 
in <xref target="sec_intprot_merkle"/>. 
</t>
<t>
This method of integrity verification has an added benefit if the system 
includes some peers that saved the complete broadcast. The benefit is that as 
soon as the broadcast ends, the content is available as a video-on-demand 
download using the now stabilized tree and the final root hash as swarm 
identifier. Peers that have saved all chunks can now announce this root hash 
to the tracking infrastructure and instantly seed it.
</t>
</section>
</section>

<section title="Forgetting Chunks" anchor="sec_live_disc">
<t>
As a live broadcast progresses a peer may want to discard the chunks that
it already played out. Ideally, other peers should be aware of this fact such
that they will not try to request these chunks from this peer. This could
happen in scenarios where live streams may be paused by viewers, or viewers
are allowed to start late in a live broadcast (e.g., start watching a broadcast 
at 20:35 whereas it began at 20:30).
</t>   
<t>
PPSPP provides a simple solution for peers to stay up-to-date with the
chunk availability of a discarding peer. A discarding peer in a live stream
MUST enable the Live Discard Window protocol option, specifying how many 
chunks/bytes it caches before the last chunk/byte it advertised as being 
available (see <xref target="sec_protopt_livediscwin"/>). Its peers 
SHOULD apply this number as a sliding window filter over the peer's chunk 
availability as conveyed via its HAVE messages. 
</t>
</section>
    

</section>




<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Protocol Options" anchor="sec_protopt">


<t>
The HANDSHAKE message in PPSPP can contain the following protocol options 
(cf. <xref target="RFC2132"/> (DHCP options)). Unless stated otherwise,
a protocol option consists of an 8-bit code followed by an 8-bit value. 
Larger values are all encoded big-endian. Each protocol option is explained 
in the following subsections. 
</t>

<texttable anchor="tab_id_proto_opt" title="PPSP Peer Protocol Options">
<preamble></preamble>
<ttcol align="left">Code</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c>		<c>Version</c>
<c>1</c>		<c>Minimum Version</c>
<c>2</c>		<c>Swarm Identifier</c>
<c>3</c>		<c>Content Integrity Protection Method</c>
<c>4</c>		<c>Merkle Hash Tree Function</c>
<c>5</c>		<c>Live Signature Algorithm</c>
<c>6</c>		<c>Chunk Addressing Method</c>
<c>7</c>		<c>Live Discard Window</c>
<c>8</c>		<c>Supported Messages</c>
<c>9-254</c>		<c>Unassigned</c>
<c>255</c>		<c>End Option</c>
	
<postamble></postamble>
</texttable>


<!-- %% -->
<section title="End Option" anchor="sec_protopt_end">
<t>
A peer MUST conclude the list of protocol options with the end option. 
Subsequent octets should be considered protocol messages.
The code for the end option is 255, and unlike others it has no value octet, 
so the option's length is 1 octet.
</t>
</section>

<!-- %% -->
<section title="Version" anchor="sec_protopt_version">
<t>
A peer MUST include the maximum version of the PPSPP protocol it supports as
the first protocol option in the list. The code for this option is 0. 
Defined values are listed in <xref target="tab_id_ver_num"/>. 
</t>

<texttable anchor="tab_id_ver_num" title="PPSP Peer Protocol Version Numbers">
<preamble></preamble>
<ttcol align="left">Version</ttcol>
<ttcol align="left">Description</ttcol>
<c>1</c>	   <c>Protocol as described in this document</c>
<c>2-255</c>	   <c>Unassigned</c>
<postamble></postamble>
</texttable>


</section>


<!-- %% -->
<section title="Minimum Version" anchor="sec_protopt_minversion">
<t>
When a peer initiates the handshake it MUST include the minimum version of the 
PPSPP protocol it supports in the list of protocol options, following the
Min/max versioning scheme defined in <xref target="RFC6709"/>, Section 4.1.
The code for this option is 1. Defined values are listed in 
<xref target="tab_id_ver_num"/>.
</t>
</section>


<!-- %% -->
<section title="Swarm Identifier" anchor="sec_protopt_swarmid">
<t>
When a peer initiates the handshake it MUST include a swarm identifier option.
In other cases a peer MAY include a swarm identifier option, as an end-to-end
check. This option has the following structure: 
</t>

<texttable align="left">
<preamble></preamble>
<ttcol align="center">Code</ttcol>
<ttcol align="center">Length</ttcol>
<ttcol align="center">Swarm Identifier</ttcol>

<c>2</c>		<c>n (16 bits)</c>	<c>i1,i2,...</c>
<postamble></postamble>
</texttable>

<t>
Each PPSPP peer knows the IDs of the swarms it joins so this information can 
be immediately verified upon receipt. The length field is 2 octets to allow
for large public keys as identifiers in live streaming.
</t>
</section>



<!-- %% -->
<section title="Content Integrity Protection Method" anchor="sec_protopt_intprot">

<t>
A peer MUST include the content integrity method used by a swarm, unless 
it uses the default, in which case it MAY include the method. 
The code for this option is 3. Defined values are listed 
in <xref target="tab_id_cont_int_prot"/>. 
</t>

<texttable anchor="tab_id_cont_int_prot" title="PPSP Peer Content Integrity Protection Methods">
<preamble></preamble>
<ttcol align="left">Method</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c>	   <c>No integrity protection</c>
<c>1</c>	   <c>Merkle Hash Trees</c>
<c>2</c>	   <c>Sign All</c>
<c>3</c>	   <c>Unified Merkle Tree</c>
<c>4-255</c>	   <c>Unassigned</c>
<postamble></postamble>
</texttable>
<t>
The "Merkle Hash Trees" method is for static content, see 
<xref target="sec_intprot_merkle"/>.  "Sign All", and "Unified Merkle 
Tree" are for live content, see <xref target="sec_live_auth"/>.
</t>
<t>
The veracity of this information will come out when the receiver
successfully verifies the first chunk from any peer.
</t>
</section>



<!-- %% -->
<section title="Merkle Tree Hash Function" anchor="sec_protopt_mhashfunc">
<t>
When the content integrity protection method is Merkle Hash Trees this option 
defining which hash function is used for the tree MUST also be defined.
The code for this option is 4. Defined values are listed in 
<xref target="tab_id_merkle_func"/> (see <xref target="FIPS180-3"/> for the 
function semantics).
</t>

<texttable anchor="tab_id_merkle_func" title="PPSP Peer Protocol Merkle Hash Functions">
<preamble></preamble>
<ttcol align="left">Function</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c>	   <c>SHA1</c>
<c>1</c>	   <c>SHA-224</c>
<c>2</c>	   <c>SHA-256</c>
<c>3</c>	   <c>SHA-384</c>
<c>4</c>	   <c>SHA-512</c>
<c>5-255</c>	   <c>Unassigned</c>
<postamble></postamble>
</texttable>
<t>
The veracity of this information will come out when the receiver
successfully verifies the first chunk from any peer.
</t>
</section>



<!-- %% -->
<section title="Live Signature Algorithm" anchor="sec_protopt_livesigalg">
<t>
When the content integrity protection method is "Sign All" or 
"Unified Merkle Tree" this option MUST also be defined. The code for this 
option is 5. The 8-bit value of this option is one of the Domain Name System 
Security (DNSSEC) Algorithm Numbers <xref target="IANADNSSECALGNUM"/>. If 
necessary, the key size that impacts signature length can be derived from the 
swarm identifier which is the signing public key in live streaming. 
</t>
<t>
The veracity of this information will come out when the receiver
successfully verifies the first chunk from any peer.
</t>
</section>



<!-- %% -->
<section title="Chunk Addressing Method" anchor="sec_protopt_chunkaddr">
<t>
A peer MUST include the chunk addressing method it uses, unless it uses
the default, in which case it MAY include the method. The code for this option 
is 6. Defined values are listed in <xref target="tab_id_chunk_addr"/>.    
</t>

<texttable anchor="tab_id_chunk_addr" title="PPSP Peer Chunk Addressing Methods">
<preamble></preamble>
<ttcol align="left">Method</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c>	   <c>32-bit bins</c>
<c>1</c>	   <c>64-bit byte ranges</c>
<c>2</c>	   <c>32-bit chunk ranges</c>
<c>3</c>	   <c>64-bit bins</c>
<c>4</c>	   <c>64-bit chunk ranges</c>
<c>5-255</c>	   <c>Unassigned</c>
<postamble></postamble>
</texttable>

<t>
The veracity of this information will come out when the receiver
parses the first message containing a chunk specification from any peer.
</t>
</section>



<!-- %% -->
<section title="Live Discard Window" anchor="sec_protopt_livediscwin">
<t>
A peer in a live swarm MUST include the discard window it uses. The unit
of the discard window depends on the chunk addressing method used. For
bins and chunk ranges it is a number of chunks, for byte ranges it is a number
of bytes. Its data type is the same as for a bin, or one value in a range 
specification. In other words, its value is a 32-bit or 64-bit integer in 
big endian format. If this option is used, the list of protocol options MUST 
therefore contain the Chunk Addressing Method option and the method option 
MUST appear before the window option in the list. This option has the following 
structure:
</t>
<texttable align="left">
<preamble></preamble>
<ttcol align="center">Code</ttcol>
<ttcol align="center">Window</ttcol>
<c>7</c>	<c>w (32 or 64-bits)</c>
<postamble></postamble>
</texttable>

<t>
A peer that does not, under normal circumstances, discard chunks MUST set
this option to the special value 0xFFFFFFFF (32-bit) or 0xFFFFFFFFFFFFFFFF 
(64-bit). For example, peers that record a complete broadcast to offer it
directly as a static asset after the broadcast ends use these values (see  
<xref target="sec_live_merkle"/>).
</t>
<t>
The veracity of this information does not impact a receiving peer more than
when a sender peer just does not respond to REQUEST messages.
</t>
</section>



<!-- %% -->
<section title="Supported Messages" anchor="sec_protopt_supmsgs">
<t>
Peers may support just a subset of the PPSPP messages. For example, peers 
running over TCP may not accept ACK messages, or peers used with a centralized 
tracking infrastructure may not accept PEX messages. For these reasons, peers 
who support only a proper subset of the PPSPP messages MUST signal which subset 
they support by means of this protocol option. The value of this option is a
length octet and a 256-bit bitmap where each bit represents a message type. 
The bitmap may be truncated to the last non-zero byte.
</t>

<texttable align="left">
<preamble></preamble>
<ttcol align="center">Code</ttcol>
<ttcol align="center">Length</ttcol>
<ttcol align="center">Message Bitmap</ttcol>

<c>8</c>		<c>n (8-bits)</c>	<c>m1,m2,...</c>
<postamble></postamble>
</texttable>

</section>

</section>



<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="UDP Encapsulation" anchor="sec_encap_udp">

<t>
Currently, PPSPP-over-UDP is the preferred deployment option. Effectively, UDP
allows the use of IP with minimal overhead and it also allows userspace
implementations. LEDBAT is used for congestion control 
<xref target="RFC6817"/>. Using LEDBAT enables PPSPP
to serve the content after playback (seeding) without disrupting the user who
may have moved to different tasks that use its network connection. LEDBAT may be
replaced with a different algorithm when the work of the IETF working group on
RTP Media Congestion Avoidance Techniques (RMCAT) <xref target="RMCATCHART"/>
matures.
</t>
    
<!-- %% -->
<section title="Chunk Size" anchor="sec_encap_udp_chunksize">
<t>
In general, an UDP datagram containing PPSPP messages SHOULD fit inside 
a single IP packet, so its maximum size depends on the MTU of the network. The 
default is to use fixed-sized chunks of 1 kilobyte such that a UDP datagram 
with a DATA message can be transmitted as a single IP packet over an Ethernet 
network with 1500-byte frames. PPSPP implementations can use larger chunk sizes. 
For example, on CPU-limited hardware 8 kilobyte chunks MAY be used, transported 
as a single UDP datagram fragmented over multiple IP packets (with the increased 
chance of that UDP datagram getting lost). The chunk addressing schemes can all 
work with different chunk sizes, see <xref target="sec_chunkaddr"/>.
</t>
<t>
The chunk size used for a particular swarm MUST be part of the swarm's metadata
(which then consists of the swarm ID and the chunk size), unless it is the 1 KB
default.
</t>
</section>


<!-- %% -->
<section title="Datagrams and Messages" anchor="sec_encap_udp_dgram">
<t>
When using UDP, the abstract datagram described above corresponds directly
to a UDP datagram. Most messages within a datagram have a fixed length, which
generally depends on the type of the message. The first byte of a message 
denotes its type. The currently defined types are:
</t>

<texttable anchor="tab_id_msg_type" title="PPSP Peer Protocol Message Types">
<preamble></preamble>
<ttcol align="left">Msg Type</ttcol>
<ttcol align="left">Description</ttcol>

<c>0</c>	<c>HANDSHAKE</c>
<c>1</c>	<c>DATA</c>
<c>2</c>	<c>ACK</c>
<c>3</c>	<c>HAVE</c>
<c>4</c>	<c>INTEGRITY</c>
<c>5</c>	<c>PEX_RESv4</c>
<c>6</c>	<c>PEX_REQ</c>
<c>7</c>	<c>SIGNED_INTEGRITY</c>
<c>8</c>	<c>REQUEST</c>
<c>9</c>	<c>CANCEL</c>
<c>10</c>	<c>CHOKE</c>
<c>11</c>	<c>UNCHOKE</c>
<c>12</c>	<c>PEX_RESv6</c>
<c>13</c>	<c>PEX_REScert</c>
<c>14-254</c>	<c>Unassigned</c>
<c>255</c>	<c>Reserved</c>
<postamble></postamble>
</texttable>

<t>
Furthermore, integers are serialized in the network (big-endian) byte order. 
So consider the example of a HAVE message (<xref target="sec_msgs_HAVE"/>)
using bin chunk addressing. It has message type of 0x02 and a payload of a bin 
number, a four-byte integer (say, 1); hence, its on the wire representation for 
UDP can be written in hex as: "0200000001". 
</t>

<t>
All messages are idempotent or recognizable as duplicates. In particular, a 
peer MAY resend DATA, ACK, HAVE, INTEGRITY, PEX_*, SIGNED_INTEGRITY, REQUEST, 
CANCEL, CHOKE and UNCHOKE messages without problems when loss is suspected. 
When a peer resends a HANDSHAKE message it can be recognized as duplicate by 
the receiver and be dealt with. 
</t>

</section>

<!-- %% -->
<section title="Channels" anchor="sec_encap_udp_channels">  
<t>
As it is increasingly complex for peers to enable UDP communication
between each other due to NATs and firewalls, PPSPP-over-UDP uses
a multiplexing scheme, called "channels", to allow multiple swarms to use the 
same UDP port. Channels loosely correspond to TCP connections and each channel
belongs to a single swarm. To support channels, each datagram starts with 
four bytes corresponding to the receiving channel number. 
</t> 
</section>

<!-- %% -->
<section title="HANDSHAKE" anchor="sec_encap_udp_HANDSHAKE">
<t>
A channel is established with a handshake. To start a handshake, the initiating 
peer needs to know:
</t>
<t>
<list style="numbers">
<t>the IP address of a peer</t>
<t>peer's UDP port and</t>
<t>the swarm ID of the content (see <xref target="sec_intprot_merkle"/>
and <xref target="sec_live"/>).</t>
<t>the chunk size used, unless the 1 KB default</t>
</list>
</t>
<t>
To do the handshake the initiating peer sends a datagram that MUST start 
with an all 0-zeros channel number, followed by a HANDSHAKE message, whose 
payload is a locally unused channel number and a list of protocol options
(see <xref target="sec_protopt"/> for which options are required and 
recommended.)

</t>
<t>
On the wire the datagram will look something like this:
<list style="hanging" hangIndent="4">
<t>
   (CHANNEL) 00000000
   HANDSHAKE 00000011 v=01 si=123...1234 ca=0 end  
</t>
</list>
</t>

<t>
(to unknown channel, handshake from channel 0x11 speaking protocol version 0x01, 
initiating a transfer of a file with a root hash 123...1234 using bins for
chunk addressing)
</t>

<t>
The receiving peer MAY respond in which case the returned datagram MUST consist
of the channel number from the sender's HANDSHAKE message, a HANDSHAKE message, 
whose payload is a locally unused channel number and a list of protocol options,
followed by any other messages it wants to send. 
</t>
<t>
Peer's response datagram on the wire:
<list style="hanging" hangIndent="4">
<t>
   (CHANNEL) 00000011
   HANDSHAKE 00000022 v=01 protocol options end
</t>
</list>
</t>
<t>
(peer to the initiator: use channel number 0x22 for this transfer and 
proto version 0x01.)
</t>

<t>
At this point, the initiator knows that the peer really responds; for that
purpose channel IDs MUST be random enough to prevent easy guessing. So, the
third datagram of a handshake MAY already contain some heavy payload. To
minimize the number of initialization roundtrips, the first two datagrams
MAY also contain some minor payload, e.g. a couple of HAVE messages roughly
indicating the current progress of a peer or a REQUEST (see 
<xref target="sec_msgs_REQUEST"/>). When receiving the third datagram, 
both peers have the proof they really talk to each other; the three-way handshake 
is complete.
</t>
<t>
A peer MAY explicit close a channel by sending a HANDSHAKE message that MUST
contain an all 0-zeros channel number and a list of protocol options. The
list MUST be either empty or contain the maximum version number the sender 
supports, following the Min/max versioning scheme defined in 
<xref target="RFC6709"/>, Section 4.1. 
</t>
<t>
On the wire:
<list style="hanging" hangIndent="4">
<t>
   (CHANNEL) 00000022 
   HANDSHAKE 00000000 end
</t>
</list>
</t>

</section>


<!-- %% -->
<section title="HAVE" anchor="sec_encap_udp_HAVE">
<t>
A HAVE message (type 0x03) consists of a chunk specification that states 
that the sending peer has those chunks and successfully checked their integrity.
A bin consists of a single integer, and a chunk or byte range of two integers, 
of the width specified by the Chunk Addressing protocol options, encoded big 
endian.
</t>
<t>
A HAVE message for bin 3 on the wire:
<list style="hanging" hangIndent="4">
<t>
   HAVE 00000003
</t>
</list>
</t>

<t>
(received and checked first four kilobytes of a file/stream)
</t>
</section>


<!-- %% -->
<section title="DATA" anchor="sec_encap_udp_DATA">
<t>
A DATA message (type 0x01) consists of a chunk specification, a timestamp and 
the actual chunk. In case a datagram contains one DATA message, a sender 
MUST always put the DATA message in the tail of the datagram. A datagram MAY 
contain multiple DATA messages unless one of the chunks is the last chunk
and smaller than the chunk size. As the LEDBAT congestion control is used, a 
sender MUST include a timestamp, in particular, a 64-bit integer representing 
the current system time with microsecond accuracy. The timestamp MUST be 
included between chunk specification and the actual chunk.
</t>
<t>A DATA message for bin 0, with timestamp 12345678, and some data on the wire:
<list style="hanging" hangIndent="4">
<t>
   DATA 00000000 12345678 48656c6c6f20776f726c6421
</t>
</list>
</t>
<t>   
(This message accommodates an entire file: "Hello world!")
</t>
</section>


<!-- %% -->
<section title="ACK" anchor="sec_encap_udp_ACK">
<t>
An ACK message (type 0x02) acknowledges data that was received from
its addressee; to comply with the LEDBAT delay-based congestion control
an ACK message consists of a chunk specification and a timestamp representing
an one-way delay sample. The one-way delay sample is a 64-bit integer with 
microsecond accuracy, and is computed from the timestamp received from the 
previous DATA message containing the chunk being acknowledged following the
LEDBAT specification. 
</t>
<t>
An ACK message for bin 2 with one-way delay 12345678 on the wire:
<list style="hanging" hangIndent="4">
<t> 
   ACK 00000002 12345678
</t>
</list>
</t>

</section>


<!-- %% -->
<section title="INTEGRITY" anchor="sec_encap_udp_INTEGRITY">
<t>
An INTEGRITY message (type 0x04) consists of a chunk specification and
the cryptographic hash for the specified chunk or node. The type and format
of the hash depends on the protocol options.
</t>
<t>
An INTEGRITY message for bin 0 with a SHA1 hash on the wire:
<list style="hanging" hangIndent="4"> 
<t>
   INTEGRITY 00000000 1234123412341234123412341234123412341234
</t>
</list>
</t>

</section>


<!-- %% -->
<section title="SIGNED_INTEGRITY" anchor="sec_encap_udp_SIGNED_INTEGRITY">
<t>
A SIGNED_INTEGRITY message (type 0x07) consists of a chunk specification and
a digital signature encoded as in DNSSEC without the BASE-64 encoding 
<xref target="RFC4034"/>. The signature algorithm is defined by the Live 
Signature Algorithm protocol option, see <xref target="sec_protopt_livesigalg"/>.
</t>
</section>

<!-- %% -->
<section title="REQUEST" anchor="sec_encap_udp_REQUEST">
<t>
A REQUEST message (type 0x08) consists of a chunk specification for the
chunks the requester wants to download.
</t>
</section>

<!-- %% -->
<section title="CANCEL" anchor="sec_encap_udp_CANCEL">
<t>
A CANCEL message (type 0x09) consists of a chunk specification for the
chunks the requester no longer is interested in.
</t>
</section>


<!-- %% -->
<section title="CHOKE and UNCHOKE" anchor="sec_encap_udp_CHOKE">
<t>
Both CHOKE and UNCHOKE messages (types 0x0a and 0x0b, respectively) carry no
payload.
</t>
</section>


<!-- %% -->
<section title="PEX_REQ, PEX_RESv4, PEX_RESv6 and PEX_REScert" anchor="sec_encap_udp_PEX">
<t>
A PEX_REQ (0x06) message has no payload. A PEX_RES (0x05) message consists
of an IPv4 address in big endian format followed by a UDP port number
in big endian format. A PEX_RESv6 (0x0c) message contains a 128-bit IPv6 
address instead of an IPv4 one. If sender of the PEX_REQ message does not have
a private or link-local address, then the PEX_RES* messages MUST NOT contain 
such addresses <xref target="RFC1918"/><xref target="RFC4291"/>.
</t>
<t>
A PEX_REScert (0x0d) message consists of a 16-bit integer in big endian
specifying the size of the membership certificate that follows, see 
<xref target="sec_sec_pex_ampl"/>. This membership certificate states that peer 
P at time T is a member of swarm S and is a X.509v3 certificate 
<xref target="RFC5280"/> that is encoded using the ASN.1 distinguished encoding 
rules (DER) <xref target="CCITT.X208.1988"/>. The certificate MUST contain a 
"Subject Alternative Name" extension, marked as critical, of type 
uniformResourceIdentifier. 
</t>
<t>
The URL the name extension contains MUST follow the generic syntax for URLs 
<xref target="RFC3986"/>, where its scheme component is "ppsp", the host in 
the authority component is the DNS name or IP address of peer P, the port in 
the authority component is the port of peer P, and the path contains the swarm 
identifier for swarm S, in hexadecimal form. In particular, the preferred form 
of the swarm identifier is xxyyzz..., where the 'x's, 'y's and 'z's are 2 
hexadecimal digits of the 8-bit pieces of the identifier. The validity time of 
the certificate is set with notBefore UTCTime set to T and notAfter UTCTime set 
to T plus some expiry time defined by the issuer. An example URL:
<list style="hanging" hangIndent="4">
<t>
   ppsp://192.168.0.1:6778/e5a12c7ad2d8fab33c699d1e198d66f79fa610c3
</t>
</list>
</t>

</section>

<!-- %% -->
<section title="KEEPALIVE" anchor="sec_encap_udp_KEEPALIVE">
<t>
Keepalives do not have a message type on UDP. They are just simple 
datagrams consisting of a 4-byte channel number only.
</t>
<t>
On the wire:
<list style="hanging" hangIndent="4">
<t>
   (CHANNEL) 00000022
</t>
</list>
</t>

<t>
A guideline for declaring a peer dead consist of a 3 minute delay since
that last packet has been received from that peer, and at least 3 datagrams 
were sent to that peer during the same period.
</t>

</section>



<!-- %% -->
<section title="Flow and Congestion Control" anchor="sec_encap_udp_control">
<t>
Explicit flow control is not necessary in PPSPP-over-UDP. In the case of
video-on-demand the receiver will request data explicitly from peers and is 
therefore in control of how much data is coming towards it. In the case of
live streaming, where a push-model may be used, the amount of data incoming
is limited to the bitrate, which the receiver must be able to process otherwise
it cannot play the stream. Should, for any reason, the receiver get saturated 
with data that situation is perfectly detected by the congestion control.
PPSPP-over-UDP can support different congestion control algorithms. At present,
it uses the LEDBAT congestion control algorithm 
<xref target="RFC6817"/>. 
</t>
</section>

</section>




<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Extensibility" anchor="sec_ext">


<!-- %% -->
<section title="Chunk Picking Algorithms" anchor="sec_ext_cpa">
<t>
Chunk (or piece) picking entirely depends on the receiving peer. The sender 
peer is made aware of preferred chunks by the means of REQUEST messages. In some
(live) scenarios it may be beneficial to allow the sender to ignore those hints 
and send unrequested data.
</t>
<t>
The chunk picking algorithm is external to the PPSPP protocol and will generally
be a pluggable policy that uses the mechanisms provided by PPSPP.
The algorithm will handle the choices made by the user consuming the content, 
such as seeking, switching audio tracks or subtitles. Example policies
for P2P streaming can be found in <xref target="BITOS"/>, 
and <xref target="EPLIVEPERF"/>.
</t>
</section>

<!-- %% -->
<section title="Reciprocity Algorithms" anchor="sec_ext_ra">
<t>
The role of reciprocity algorithms in peer-to-peer systems is to promote
client contribution and prevent freeriding. A peer is said to be freeriding
if it only downloads content but never uploads to others. Examples of 
reciprocity algorithms are tit-for-tat as used in BitTorrent <xref target="TIT4TAT"/> 
and Give-to-Get <xref target="GIVE2GET"/>. In PPSPP, reciprocity enforcement
is the sole responsibility of the sender peer.
</t>
</section>

</section>


<!-- %% -->
<section title="Acknowledgements" anchor="sec_ack">
<t>
Arno Bakker, Riccardo Petrocco and Victor Grishchenko are partially supported 
by the P2P-Next project (http://www.p2p-next.org/), a research project 
supported by the European Community under its 7th Framework Programme
(grant agreement no. 216217).  The views and conclusions contained 
herein are those of the authors and should not be interpreted as 
necessarily representing the official policies or endorsements, 
either expressed or implied, of the P2P-Next project or the European 
Commission.
</t>
<t>
The PPSPP protocol was designed by Victor Grishchenko at Technische Universiteit 
Delft. The authors would like to thank the following people for their 
contributions to this draft: the chairs and members of the IETF PPSP working 
group, and Mihai Capota, Raul Jimenez, Flutra Osmani, Johan Pouwelse, 
and Raynor Vliegendhart.
</t>

</section>


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="IANA Considerations" anchor="sec_iana" >


<t>The new registries defined below are requested for the extensibility of the 
protocol. The "Unassigned" ranges designated are governed by the policy 
'RFC Required' as described in <xref target="RFC5226"/>.
</t>
<t>
<list style="symbols">
<t>PPSP Peer Protocol Message Type Registry, see <xref target="tab_id_msg_type"/>.</t>
<t>PPSP Peer Protocol Option Registry, see <xref target="tab_id_proto_opt"/>.</t>
<t>PPSP Peer Protocol Version Number Registry, see <xref target="tab_id_ver_num"/>.</t>
<t>PPSP Peer Protocol Content Integrity Protection Method Registry, see <xref target="tab_id_cont_int_prot"/>.</t>   
<t>PPSP Peer Protocol Merkle Hash Tree Function Registry, see <xref target="tab_id_merkle_func"/>.</t>
<t>PPSP Peer Protocol Chunk Addressing Method Registry, see <xref target="tab_id_chunk_addr"/>.</t>
</list>
</t>


</section>


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Security Considerations" anchor="sec_sec">
<t>
As any other network protocol, the PPSPP faces a common set of security
challenges. An implementation must consider the possibility of buffer
overruns, DoS attacks and manipulation (i.e. reflection attacks). Any
guarantee of privacy seems unlikely, as the user is exposing its IP address
to the peers. A probable exception is the case of the user being hidden
behind a public NAT or proxy. This section discusses the protocol's security
considerations in detail.
</t>

<!-- %% -->
<section title="Security of the Handshake Procedure" anchor="sec_sec_handshake">
<t>
Borrowing from the analysis in <xref target="RFC5971"/>, the PPSP peer protocol 
may be attacked with 3 types of denial-of-service attacks:
</t>
<t>
<list style="numbers">
<t>DOS amplification attack: attackers try to use a PPSPP peer to 
generate more traffic to a victim.</t>

<t>DOS flood attack: attackers try to deny service to other peers by 
allocating lots of state at a PPSPP peer.</t>

<t>Disrupt service to an individual peer: attackers send bogus e.g. REQUEST
and HAVE messages appearing to come from victim peer A to the peers B1..Bn 
serving that peer. This causes A to receive chunks it did not request or to not
receive the chunks it requested.
</t>
</list>
</t>
<t>
The basic scheme to protect against these attacks is the use of a secure 
handshake procedure. In the UDP encapsulation the handshake procedure
is secured by the use of randomly chosen channel IDs as follows. 
The channel IDs must be generated following the requirements in 
<xref target="RFC4960"/> (Sec. 5.1.3).
</t>
<t>
When UDP is used, all datagrams carrying PPSPP messages are prefixed 
with a 4-byte channel ID. These channel IDs are random numbers, 
established during the handshake phase as follows. Peer A initiates an 
exchange with peer B by sending a datagram containing a HANDSHAKE 
message prefixed with the channel ID consisting of all 0s. Peer A's 
HANDSHAKE contains a randomly chosen channel ID, chanA:
</t>
<t>
   A->B:	chan0 + HANDSHAKE(chanA) + ...
</t>
<t>
When peer B receives this datagram, it creates some state for peer A, 
that at least contains the channel ID chanA. Next, peer B sends a 
response to A, consisting of a datagram containing a HANDSHAKE message 
prefixed with the chanA channel ID. Peer B's HANDSHAKE contains a 
randomly chosen channel ID, chanB.
</t>
<t>
   B->A:	chanA + HANDSHAKE(chanB) + ...
</t>
<t>
Peer A now knows that peer B really responds, as it echoed chanA. So the 
next datagram that A sends may already contain heavy payload, i.e.,
a chunk. This next datagram to B will be prefixed with the chanB channel 
ID. When B receives this datagram, both peers have the proof
they are really talking to each other, the three-way handshake is 
complete. In other words, the randomly chosen channel IDs act as tags
(cf. <xref target="RFC4960"/> (Sec. 5.1)).
</t>
<t>
   A->B:	chanB + HAVE + DATA + ...
</t>


<section title="Protection against attack 1" anchor="sec_sec_handshake_at1">
<t>
In short, PPSPP does a so-called return routability check before heavy payload 
is sent. This means that attack 1 is fended off: PPSPP does not send back much 
more data than it received, unless it knows it is talking to a live peer. 
Attackers sending a spoofed HANDSHAKE to B pretending to be A now need to 
intercept the message from B to A to get B to send heavy payload, and ensure 
that that heavy payload goes to the victim, something assumed too hard to be a 
practical attack. 
</t>

<t>
Note the rule is that no heavy payload may be sent until the third 
datagram. This has implications for PPSPP implementations that use chunk 
addressing schemes that are verbose. If a PPSPP implementation uses 
large bitmaps to convey chunk availability these may not be sent by peer 
B in the second datagram.
</t>
</section>


<section title="Protection against attack 2" anchor="sec_sec_handshake_at2">
<t>
On receiving the first datagram peer B will record some 
state about peer A. At present this state consists of the chanA channel 
ID, and the results of processing the other messages in the first 
datagram. In particular, if A included some HAVE messages, B may add a 
chunk availability map to A's state. In addition, B may request
some chunks from A in the second datagram, and B will maintain state 
about these outgoing requests.
</t>
<t>
So presently, PPSPP is somewhat vulnerable to attack 2. An attacker 
could send many datagrams with HANDSHAKEs and HAVEs and thus allocate 
state at the PPSPP peer. Therefore peer A MUST respond immediately to the 
second datagram, if it is still interested in peer B.
</t>
<t>
The reason for using this slightly vulnerable three-way handshake instead of 
the safer handshake procedure of SCTP <xref target="RFC4960"/> (Sec. 5.1) is 
quicker response time for the user.  In the SCTP procedure, peer A and B 
cannot request chunks until datagrams 3 and 4 respectively, as opposed to 2 
and 1 in the proposed procedure. This means that the user has to wait shorter 
in PPSPP between starting the video stream and seeing the first images.
</t>
</section>

<section title="Protection against attack 3" anchor="sec_sec_handshake_at3">
<t>
In general, channel IDs serve to authenticate a peer. Hence, to attack, 
a malicious peer T would need to be able to eavesdrop on conversations between 
victim A and a benign peer B to obtain the channel ID B assigned to A, chanB. 
Furthermore,  attacker T would need to be able to spoof e.g. REQUEST and HAVE 
messages from A to cause B to send heavy DATA messages to A, or prevent B from 
sending them, respectively. 
</t>
<t>
The capability to eavesdrop is not common, so the protection afforded by
channel IDs will be sufficient in most cases. If not, point-to-point encryption
of traffic should be used, see below.
</t>
</section>

</section>


<!-- %% -->
<section title="Secure Peer Address Exchange" anchor="sec_sec_pex">
<t>
As described in <xref target="sec_msgs_PEX"/>, a peer A can send a 
Peer-Exchange message  PEX_RES to a peer B, which contains the IP address and 
port of  other peers that are supposedly also in the current swarm. The strength 
of  this mechanism is that it allows decentralized tracking: after an initial 
bootstrap no central tracker is needed anymore. The vulnerability of this 
mechanism (and DHTs) is that malicious peers can use it for an Amplification 
attack.
</t>
<t>
In particular, a malicious peer T could send a PEX_RES to well-behaved 
peer A containing a list of address B1,B2,...,BN and on receipt,
peer A could send a HANDSHAKE to all these peers. So in the worst
case, a single datagram results in N datagrams. The actual damage
depends on A's behaviour. E.g. when A already has sufficient connections
it may not connect to the offered ones at all, but if it is a fresh peer 
it may connect to all directly.
</t>
<t>
In addition, PEX can be used in Eclipse attacks <xref target="ECLIPSE" />
where malicious peers try to isolate a particular peer such that it only 
interacts with malicious peers. Let us distinguish two specific attacks:
</t>
<t>
<list>
<t>E1. Malicious peers try to eclipse the single injector in live streaming.</t>
<t>E2. Malicious peers try to eclipse a specific consumer peer.</t>
</list>
</t>
<t>
Attack E1 has the most impact on the system as it would disrupt all peers.
</t>

<section title="Protection against the Amplification Attack" anchor="sec_sec_pex_ampl">
<t>
If peer addresses are relatively stable, strong protection against the attack 
can be provided by using public key cryptography and certification. In 
particular, a PEX_RES message will carry swarm-membership certificates rather than 
IP address and port. A membership certificate for peer B states that peer B 
at address (ipB,portB) is part of swarm S at time T and is cryptographically 
signed. The receiver A can check the cert for a valid signature, the right 
swarm and liveliness and only then consider contacting B. These swarm-membership 
certificates correspond to signed node descriptors in secure decentralized peer 
sampling services <xref target="SPS"/>.
</t>
<t>
Several designs are possible for the security environment for these 
membership certificates. That is, there are different designs possible for who 
signs the membership certificates and how public keys are distributed.
As an example, we describe a design where the PPSP tracker acts as certification
authority.
</t>
</section>

<section title="Example: Tracker as Certification Authority" anchor="sec_sec_pex_trackerca">
<t>
A peer A wanting to join swarm S sends a certificate request message to a 
tracker X for that swarm. Upon receipt, the tracker creates a membership 
certificate from the request with swarm ID S, a timestamp T and the external 
IP and port it received the message from, signed with the tracker's private 
key. This certificate is returned to A. 
</t>
<t>
Peer A then includes this certificate when it sends a PEX_RES to peer B.
Receiver B verifies it against the tracker public key. This tracker public key 
should be part of the swarm's metadata, which B received from a trusted 
source. Subsequently, peer B can send the member certificate of A to other 
peers in PEX_RES messages.
</t>
<t>
Peer A can send the certification request when it first contacts the tracker,
or at a later time. Furthermore, the responses the tracker sends could contain
membership certificates instead of plain addresses, such that they can be 
gossiped securely as well.
</t>
<t>
We assume the tracker is protected against attacks and does a return routability 
check. The latter ensures that malicious peers cannot obtain a certificate
for a random host, just for hosts where they can eavesdrop on incoming traffic. 
</t>
<t>
The load generated on the tracker depends on churn and the lifetime of a
certificate. Certificates can be fairly long lived, given that the main goal 
of the membership certificates is to prevent that malicious peer T can cause 
good peer A to contact *random* hosts. The freshness of the timestamp just adds 
extra protection in addition to achieving that goal. It protects against 
malicious hosts causing a good peer A to contact hosts that previously 
participated in the swarm. 
</t>
<t>
The membership certificate mechanism itself can be used for a kind of 
amplification attack against good peers. Malicious peer T can cause peer A to 
spend some CPU to verify the signatures on the membership certificates that T 
sends. To counter this, A SHOULD check a few of the certificates sent and discard
the rest if they are defective.
</t>
<t>
The same membership certificates described above can be registered in a 
Distributed Hash Table that has been secured against the well-known 
DHT specific attacks <xref target="SECDHTS"/>.
</t>
<t>
Note that this scheme does not work for peers behind a symmetric 
Network Address Translator, but neither does normal tracker registration.     
</t>
</section>


<section title="Protection Against Eclipse Attacks" anchor="sec_sec_pex_eclipse">
<t>
Before we can discuss Eclipse attacks we first need to establish the
security properties of the central tracker. A tracker is vulnerable to 
Amplification attacks too. A malicious peer T could register a victim B 
with the tracker, and many peers joining the swarm will contact B.
Trackers can also be used in Eclipse attacks. If many malicious peers 
register themselves at the tracker, the percentage of bad peers in the 
returned address list may become high. Leaving the protection of the 
tracker to the PPSP tracker protocol specification, we assume for the following 
discussion that it returns a true random sample of the actual swarm 
membership  (achieved via Sybil attack protection). This means that if 
50% of the peers is bad, you'll still get 50% good addresses from the 
tracker.
</t>
<t>
Attack E1 on PEX can be fended off by letting live injectors disable 
PEX. Or at least, let live injectors ensure that part of their connections 
are to peers whose addresses came from the trusted tracker.
</t>
<t>
The same measures defend against attack E2 on PEX. They can also be 
employed dynamically. When the current set of peers B that peer A is
connected to doesn't provide good quality of service, A can contact the 
tracker to find new candidates.
</t>

</section>
</section>



<!-- %% -->
<section title="Support for Closed Swarms (PPSP.SEC.REQ-1)" anchor="sec_sec_cs">
<t>
The Closed Swarms <xref target="CLOSED"/> and Enhanced Closed Swarms 
<xref target="ECS"/> mechanisms provide swarm-level access control. The basic 
idea is that a peer cannot download from another peer unless it shows a 
Proof-of-Access. Enhanced Closed Swarms improve on the original Closed Swarms 
by adding on-the-wire encryption against man-in-the-middle attacks and more 
flexible access control rules.
</t>
<t>
The exact mapping of ECS to PPSPP is defined in 
<xref target="I-D.gabrijelcic-ppsp-ecs"/>.
</t>
</section>

<!-- %% -->
<section title="Confidentiality of Streamed Content (PPSP.SEC.REQ-2+3)" anchor="sec_sec_conf">
<t>
No extra mechanism is needed to support confidentiality in PPSPP. A
content publisher wishing confidentiality should just distribute content 
in cyphertext / DRM-ed format. In that case it is assumed a higher layer 
handles key management out-of-band. Alternatively, pure point-to-point 
encryption of content and traffic can be provided by the proposed Closed Swarms 
access control mechanism, or by DTLS <xref target="RFC6347" /> or IPsec 
<xref target="RFC4301"/>. 
</t>
</section>


<!-- %% -->
<section title="Limit Potential Damage and Resource Exhaustion by Bad or Broken Peers (PPSP.SEC.REQ-4+6)" anchor="sec_sec_limit">
<t>
In this section an analysis is given of the potential damage
a malicious peer can do with each message in the protocol, and how it
is prevented by the protocol (implementation).
</t>


<section title="HANDSHAKE" anchor="sec_sec_limit_HANDSHAKE">
<t>
<list style="symbols">
<t>Secured against DoS amplification attacks as described in 
   <xref target="sec_sec_handshake"/>.
</t>
<t>Threat HS.1: An Eclipse attack where peers T1..TN fill all connection
   slots of A by initiating the connection to A.
<vspace blankLines="1"/>
   Solution: Peer A must not let other peers fill all its available 
   connection slots, i.e., A must initiate connections itself too, to prevent 
   isolation. 
</t>
</list>
</t>
</section>


<section title="HAVE" anchor="sec_sec_limit_HAVE">
<t>
<list style="symbols">
<t>Threat HAVE.1: Malicious peer T can claim to have content which it
   hasn't. Subsequently T won't respond to requests.
<vspace blankLines="1"/>
   Solution: peer A will consider T to be a slow peer and not ask it
   again.
</t>
<t>
   Threat HAVE.2: Malicious peer T can claim not to have content. Hence it
   won't contribute.
<vspace blankLines="1"/>
   Solution: Peer and chunk selection algorithms external to the protocol will
   implement fairness and provide sharing incentives. 
</t>
</list>
</t>
</section>

<section title="DATA" anchor="sec_sec_limit_DATA">
<t>
<list style="symbols">
<t>Threat DATA.1: peer T sending bogus chunks.
<vspace blankLines="1"/>
   Solution: The content integrity protection schemes defend against
   this.
</t>
<t>Threat DATA.2: peer T sends peer A unrequested chunks.
<vspace blankLines="1"/>
   To protect against this threat we need network-level DoS
   prevention.
</t>
</list>
</t>
</section>

<section title="ACK" anchor="sec_sec_limit_ACK">
<t>
<list style="symbols">
<t>Threat ACK.1: peer T acknowledges wrong chunks.
<vspace blankLines="1"/>
   Solution: peer A will detect inconsistencies with the data it sent to T.
</t>
<t>Threat ACK.2: peer T modifies timestamp in ACK to peer A used for
   time-based congestion control.
<vspace blankLines="1"/>
   Solution: In theory, by decreasing the timestamp peer T could fake there is 
   no congestion when in fact there is, causing A to send more data than it 
   should. <xref target="RFC6817"/> does not list this as a 
   security consideration. Possibly this attack can be detected by the large 
   resulting asymmetry between round-trip time and measured one-way delay.
</t>
</list>
</t>
</section>

<section title="INTEGRITY and SIGNED_INTEGRITY" anchor="sec_sec_limit_INTEGRITY">
<t>
<list style="symbols">
<t>Threat INTEGRITY.1: An amplification attack where peer T sends bogus INTEGRITY 
   or SIGNED_INTEGRITY messages, causing peer A to checks hashes or signatures, 
   thus spending CPU unnecessarily.
<vspace blankLines="1"/>
   Solution: If the hashes/signatures don't check out A will stop asking T 
   because of the atomic datagram principle and the content integrity 
   protection. Subsequent unsolicited traffic from T will be ignored.
</t>
</list>
</t>
</section>


<section title="REQUEST" anchor="sec_sec_limit_REQUEST">
<t>
<list style="symbols">
<t>Threat REQUEST.1: peer T could request lots from A, leaving A without
   resources for others.
<vspace blankLines="1"/>
   Solution: A limit is imposed on the upload capacity a single peer
   can consume, for example, by using an upload bandwidth scheduler that takes 
   into account the need of multiple peers. A natural upper limit of this 
   upload quotum is the bitrate of the content, taking into account that this 
   may be variable.
</t>
</list>
</t>
</section>


<section title="CANCEL" anchor="sec_sec_limit_CANCEL">
<t>
<list style="symbols">
<t>Threat CANCEL.1: peer T sends CANCEL messages for content it never requested
to peer A.
<vspace blankLines="1"/>
   Solution: peer A will detect the inconsistency of the messages and ignore 
   them. Note that CANCEL messages may be received unexpectedly when a 
   transport is used where REQUEST messages may be lost or reordered with
   respect to the subsequent CANCELs.
</t>
</list>
</t>
</section>

<section title="CHOKE" anchor="sec_sec_limit_CHOKE">
<t>
<list style="symbols">
<t>Threat CHOKE.1: peer T sends REQUEST messages after peer A sent B a CHOKE
message.
<vspace blankLines="1"/>
   Solution: peer A will just discard the unwanted REQUESTs and resend the
   CHOKE, assuming it got lost. 
</t>
</list>
</t>
</section>


<section title="UNCHOKE" anchor="sec_sec_limit_UNCHOKE">
<t>
<list style="symbols">
<t>Threat UNCHOKE.1: peer T sends an UNCHOKE message to peer A without having
    sent a CHOKE message before.
<vspace blankLines="1"/>
   Solution: peer A can easily detect this violation of protocol state,
   and ignore it. Note this can also happen due to loss of a CHOKE message 
   sent by a benign peer. 
</t>
<t>Threat UNCHOKE.2: peer T sends an UNCHOKE message to peer A, but subsequently 
does not respond to its REQUESTs.
<vspace blankLines="1"/>
   Solution: peer A will consider T to be a slow peer and not ask it
   again. 
</t>
</list>
</t>
</section>


<section title="PEX_RES" anchor="sec_sec_limit_PEX_RES">
<t>
<list style="symbols">
<t> Secured against amplification and Eclipse attacks as described in 
   <xref target="sec_sec_pex"/>.
</t>
</list>
</t>
</section>



<section title="Unsolicited Messages in General" anchor="sec_sec_limit_genmsg">
<t>
<list style="symbols">
<t>
   Threat: peer T could send a spoofed PEX_REQ or REQUEST from peer B to
   peer A, causing A to send a PEX_RES/DATA to B.
<vspace blankLines="1"/>
   Solution: the message from peer T won't be accepted unless T does a
   handshake first, in which case the reply goes to T, not victim B.
</t>
</list>
</t>
</section>
</section>

<section title="Exclude Bad or Broken Peers (PPSP.SEC.REQ-5)" anchor="sec_sec_exclude">
<t>
A receiving peer can detect malicious or faulty senders as just described, 
which it can then subsequently ignore. However, 
excluding such a bad peer from the system completely is complex. Random 
monitoring by trusted peers that would blacklist bad peers as described
in <xref target="DETMAL"/> is one option. This mechanism does require extra
capacity to run such trusted peers, which must be indistinguishable
from regular peers, and requires a solution for the timely distribution of this
blacklist to peers in a scalable manner.
</t> 

</section>
</section>




</middle>

<!--  *****BACK MATTER ***** -->

<back>
    <!-- References split into informative and normative -->

    <!-- There are 2 ways to insert reference entries from the citation libraries:
     1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
     2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
        (for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")

     Both are cited textually in the same manner: by using xref elements.
     If you use the PI option, xml2rfc will, by default, try to find included files in the same
     directory as the including file. You can also define the XML_LIBRARY environment variable
     with a value containing a set of directories to search.  These can be either in the local
     filing system or remote ones accessed by http (http://domain/dir/... ).-->

<references title="Normative References">

   &RFC2119;

   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.1918.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3986.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4034.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4291.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5226.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5280.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6817.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml2/_reference.CCITT.X208.1988.xml"?>

   <reference anchor="FIPS180-3">
      <front>
      <title>Federal Information Processing Standards: Secure Hash Standard (SHS)</title>
      <author>
        <organization>Information Technology Laboratory, 
        National Institute of Standards and Technology</organization>
        <address>
	  <postal>
	     <street></street>
	     <city>Gaithersburg, MD 20899-8900</city>
	     <country>USA</country>
          </postal>
        </address>
      </author>
      <date year="2008" month="Oct"/>
      </front>
      <seriesInfo name="Publication" value="180-3"/>
    </reference>

   <reference anchor="IANADNSSECALGNUM" target="http://www.iana.org/assignments/dns-sec-alg-numbers">
     <front>
       <title>Domain Name System Security (DNSSEC) Algorithm Numbers</title>
       <author>
         <organization>IANA</organization>
       </author>
     </front>
   </reference>
   
   
</references>

<references title="Informative References">

   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2132.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4301.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4960.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5389.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5971.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6709.xml"?>
   <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6347.xml"?>

   <?rfc include="http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-ppsp-reqs-05.xml"?>

   <reference anchor="MERKLE">
      <front>
      <title>Secrecy, Authentication, and Public Key Systems</title>

      <author initials="R." surname="Merkle">
      <organization>Dept. of Electrical Engineering, Stanford University, CA, USA</organization>
      </author>
      <date year="1979" />
      </front>
      <seriesInfo name="Ph.D. thesis" value="Dept. of Electrical Engineering, Stanford University, CA, USA, pp 40-45"/>
   </reference>


   <reference anchor="ABMRKL" target="http://bittorrent.org/beps/bep_0030.html">
      <front>
      <title>Merkle hash torrent extension</title>

      <author initials="A." surname="Bakker">
      <organization></organization>
      </author>

      <date year="2009" month="Mar"/>
      </front>
      <seriesInfo name="BitTorrent Enhancement Proposal" value="30"/>
   </reference>

   <reference anchor="JIM11">
      <front>
      <title>Sub-Second Lookups on a Large-Scale Kademlia-Based Overlay</title>

      <author initials="R." surname="Jimenez">
      <organization></organization>
      </author>
      <author initials="F." surname="Osmani">
      <organization></organization>
      </author>
      <author initials="B." surname="Knutsson">
      <organization></organization>
      </author>
      <date year="2011" month="Aug"/>
      </front>
      <seriesInfo name="IEEE International Conference on Peer-to-Peer Computing" value="(P2P'11), Kyoto, Japan"/>
   </reference>

   <reference anchor="SWIFTIMPL" target="https://github.com/triblerteam/libswift/">
      <front>
      <title>Swift reference implementation</title>

      <author initials="V." surname="Grishchenko">
      <organization></organization>
      </author>
      <author initials="J." surname="Paananen">
      <organization></organization>
      </author>
      <author initials="A." surname="Pronchenkov">
      <organization></organization>
      </author>
      <author initials="A." surname="Bakker">
      <organization></organization>
      </author>
      <author initials="R." surname="Petrocco">
      <organization></organization>
      </author>

      <date year="2012" />
      </front>
   </reference>

   <reference anchor="RMCATCHART" target="http://datatracker.ietf.org/wg/rmcat/charter/"> 
      <front>
      <title>RTP Media Congestion Avoidance Techniques (rmcat) Description of Working Group</title>

      <author initials="L." surname="Eggert">
      <organization></organization>
      </author>

      <author initials="" surname="others">
      <organization></organization>
      </author>

      <date year="2012" />
      </front>
   </reference>


   <reference anchor="HAC01">
      <front>
      <title>Handbook of Applied Cryptography</title>

      <author initials="A.J." surname="Menezes">
      <organization></organization>
      </author>

      <author initials="P.C." surname="van Oorschot">
      <organization></organization>
      </author>

      <author initials="S.A." surname="Vanstone">
      <organization></organization>
      </author>

      <date year="1996" month="Oct"/>
      </front>
      
      <seriesInfo name="CRC Press," value="(Fifth Printing, August 2001)"/>
      
   </reference>



   <reference anchor="BITTORRENT" target="http://bittorrent.org/beps/bep_0003.html">
      <front>
      <title>The BitTorrent Protocol Specification</title>

      <author initials="B." surname="Cohen">
      <organization></organization>
      </author>

      <date year="2008" month="Feb"/>
      </front>
      <seriesInfo name="BitTorrent Enhancement Proposal" value="3"/>
   </reference>


   <reference anchor="SNP" target="http://www.brynosaurus.com/pub/net/p2pnat/">
      <front>
      <title>Peer-to-Peer Communication Across Network Address Translators</title>

      <author initials="B." surname="Ford">
      <organization></organization>
      </author>

      <author initials="P." surname="Srisuresh">
      <organization></organization>
      </author>

      <author initials="D." surname="Kegel">
      <organization></organization>
      </author>

      <date year="2005" month="Feb"/>
      </front>
   </reference>

   <reference anchor="BINMAP">
      <front>
      <title>Binmaps: hybridizing bitmaps and binary trees</title>

      <author initials="V." surname="Grishchenko">
      <organization></organization>
      </author>
      <author initials="J." surname="Pouwelse">
      <organization></organization>
      </author>

      <date year="2009" month="Apr"/>
      </front>
      <seriesInfo name="Technical Report" value="PDS-2011-005, Parallel and 
      Distributed Systems Group, Fac. of Electrical Engineering, 
      Mathematics, and Computer Science, Delft University of Technology, 
      The Netherlands"/>
   </reference>

   <reference anchor="TIT4TAT">
      <front>
      <title>Incentives Build Robustness in BitTorrent</title>

      <author initials="B." surname="Cohen">
      <organization></organization>
      </author>

      <date year="2003" month="Jun" />
      </front>
      <seriesInfo name="1st Workshop on Economics of Peer-to-Peer Systems," value="Berkeley, CA, USA"/>
   </reference>

   <reference anchor="CLOSED" target="http://doi.acm.org/10.1145/1877891.1877898">
      <front>
      <title>Access Control to BitTorrent Swarms Using Closed Swarms</title>

      <author initials="N." surname="Borch">
      <organization></organization>
      </author>
      <author initials="K." surname="Mitchell">
      <organization></organization>
      </author>
      <author initials="I." surname="Arntzen">
      <organization></organization>
      </author>
      <author initials="D." surname="Gabrijelcic">
      <organization></organization>
      </author>

      <date year="2010" month="Oct"/>
      </front>
      <seriesInfo name="ACM workshop on Advanced Video Streaming Techniques
      for Peer-to-Peer Networks and Social Networking" value="(AVSTP2P '10), Florence, Italy"/>
   </reference>

   <reference anchor="ECS">
      <front>
      <title>Access Control in BitTorrent P2P Networks Using the Enhanced 
      Closed Swarms Protocol</title>

      <author initials="V." surname="Jovanovikj">
      <organization></organization>
      </author>
      <author initials="D." surname="Gabrijelcic">
      <organization></organization>
      </author>
      <author initials="T." surname="Klobucar">
      <organization></organization>
      </author>

      <date year="2011" month="Aug"/>
      </front>
      <seriesInfo name="International Conference on Emerging Security 
      Information, Systems and Technologies" value="(SECURWARE 2011), pp. 97-102, Nice, France"/>
   </reference>

   <reference anchor="ECLIPSE">
      <front>
      <title>Security Considerations for Peer-to-Peer Distributed Hash Tables</title>

      <author initials="E." surname="Sit">
      <organization></organization>
      </author>
      <author initials="R." surname="Morris">
      <organization></organization>
      </author>

      <date year="2002" />
      </front>
      <seriesInfo name="IPTPS '01: Revised Papers
      from the First International Workshop on Peer-to-Peer Systems" 
      value="pp. 261-269, Springer-Verlag"/>
   </reference>

   <reference anchor="SECDHTS">
      <front>
      <title>A Survey of DHT Security Techniques</title>

      <author initials="G." surname="Urdaneta">
      <organization></organization>
      </author>
      <author initials="G." surname="Pierre">
      <organization></organization>
      </author>
      <author initials="M." surname="van Steen">
      <organization></organization>
      </author>

      <date year="2011" month="Jun"/>
      </front>
      <seriesInfo name="ACM Computing Surveys" value="vol. 43(2)"/>
   </reference>


   <reference anchor="SPS">
      <front>
      <title>Secure Peer Sampling</title>

      <author initials="G.P." surname="Jesi">
      <organization></organization>
      </author>
      <author initials="A." surname="Montresor">
      <organization></organization>
      </author>
      <author initials="M." surname="van Steen">
      <organization></organization>
      </author>

      <date year="2010" month="Aug"/>
      </front>
      <seriesInfo name="Computer Networks" value="vol. 54(12), pp. 2086-2098, Elsevier"/>
   </reference>


   <reference anchor="SIGMCAST">
      <front>
      <title>Digital Signatures for Flows and Multicasts</title>

      <author initials="C.K." surname="Wong">
      <organization></organization>
      </author>
      <author initials="S.S." surname="Lam">
      <organization></organization>
      </author>

      <date year="1999"/>
      </front>
      <seriesInfo name="IEEE/ACM Transactions on Networking" value="7(4), pp. 502-513"/>
   </reference>

   <reference anchor="POLLIVE">
      <front>
      <title>Pollution in P2P Live Video Streaming</title>

      <author initials="P." surname="Dhungel">
      <organization></organization>
      </author>
      <author initials="Xiaojun" surname="Hei">
      <organization></organization>
      </author>
      <author initials="K." surname="Ross">
      <organization></organization>
      </author>
      <author initials="N." surname="Saxena">
      <organization></organization>
      </author>

      <date year="2009" month="Jul"/>
      </front>
      <seriesInfo name="International Journal of Computer Networks & Communications (IJCNC)" value="Vol.1, No.2"/>
   </reference>

   <reference anchor="DETMAL">
      <front>
      <title>Detecting Malicious Peers in Overlay Multicast Streaming</title>

      <author initials="S." surname="Shetty">
      <organization></organization>
      </author>
      <author initials="P." surname="Galdames">
      <organization></organization>
      </author>
      <author initials="W." surname="Tavanapong">
      <organization></organization>
      </author>
      <author initials="Ying" surname="Cai">
      <organization></organization>
      </author>

      <date year="2006" month="Nov"/>
      </front>
      <seriesInfo name="IEEE Conference on Local Computer Networks" value="(LCN'06). Tampa, FL, USA"/>
   </reference>

   <reference anchor="GIVE2GET">
      <front>
      <title>Give-to-Get: Free-riding Resilient Video-on-demand in P2P Systems</title>

      <author initials="J.J.D." surname="Mol">
      <organization></organization>
      </author>
      <author initials="J." surname="Pouwelse">
      <organization></organization>
      </author>
      <author initials="M." surname="Meulpolder">
      <organization></organization>
      </author>
      <author initials="D.H.J." surname="Epema">
      <organization></organization>
      </author>
      <author initials="H.J." surname="Sips">
      <organization></organization>
      </author>

      <date year="2008" month="Jan"/>
      </front>
      <seriesInfo name="Proceedings Multimedia Computing and Networking conference
      (Proceedings of SPIE Vol. 6818)" value="San Jose, California, USA"/>
   </reference>
   
   <reference anchor="EPLIVEPERF">
      <front>
      <title>Epidemic Live Streaming: Optimal Performance Trade-offs</title>

      <author initials="T." surname="Bonald">
      <organization></organization>
      </author>
      <author initials="L." surname="Massoulié">
      <organization></organization>
      </author>
      <author initials="F." surname="Mathieu">
      <organization></organization>
      </author>
      <author initials="D." surname="Perino">
      <organization></organization>
      </author>
      <author initials="A." surname="Twigg">
      <organization></organization>
      </author>

      <date year="2008" month="Jun"/>
      </front>
      <seriesInfo name="Proceedings of the 2008 ACM SIGMETRICS International 
         Conference on Measurement and Modeling of Computer Systems" 
         value="Annapolis, MD, USA"/>
   </reference>

   <reference anchor="BITOS">
      <front>
      <title>BiToS: Enhancing BitTorrent for Supporting Streaming Applications</title>

      <author initials="A." surname="Vlavianos">
      <organization></organization>
      </author>
      <author initials="M." surname="Iliofotou">
      <organization></organization>
      </author>
      <author initials="F." surname="Mathieu">
      <organization></organization>
      </author>
      <author initials="M." surname="Faloutsos">
      <organization></organization>
      </author>

      <date year="2006" month="Apr"/>
      </front>
      <seriesInfo name="IEEE INFOCOM Global Internet Symposium" 
         value="Barcelona, Spain"/>
   </reference>
   
   <reference anchor='I-D.gabrijelcic-ppsp-ecs'>
      <front>
      <title>Enhanced Closed Swarm protocol</title>

      <author initials='D.' surname='Gabrijelcic' fullname='Dusan Gabrijelcic'>
      <organization>Jozef Stefan Institute, Slovenia</organization>
      </author>

      <date month='November' day='29' year='2012' />

      </front>

      <seriesInfo name='Internet-Draft' value='draft-ppsp-gabrijelcic-ecs' />
      <format type='TXT'
        target='http://datatracker.ietf.org/doc/draft-ppsp-gabrijelcic-ecs/' />
   </reference>
   
</references>


<section title="Revision History" anchor="sec_revhist">
<t>
<list style="hanging" hangIndent="4">

<t hangText="-00        2011-12-19 Initial version.">  
</t>

<t hangText="-01        2012-01-30 Minor text revision:">  
<list style="symbols">
<t>
Changed heading to "A. Bakker"
</t><t>				
Changed title to *Peer* Protocol, and abbreviation PPSPP.
</t><t>		
Replaced swift with PPSPP.
</t><t>
Removed Sec. 6.4. "HTTP (as PPSP)".
</t><t>
Renamed Sec. 8.4. to "Chunk Picking Algorithms".
</t><t>
Resolved Ticket #3: Removed sentence about random set of peers.
</t><t>
Resolved Ticket #6: Added clarification to "Chunk Picking 
Algorithms" section.
</t><t>
Resolved Ticket #11: Added Sec. 3.12 on Storage Independence
</t><t>
Resolved Ticket #14: Added clarification to "Automatic Size 
Detection" section.
</t><t>
Resolved Ticket #15: Operation section now states it shows  
example behaviour for a specific set of policies and schemes.
</t><t>	
Resolved Ticket #30: Explained why multiple REQUESTs in one datagram.
</t><t>	
Resolved Ticket #31: Renamed PEX_ADD message to PEX_RES.
</t><t>	
Resolved Ticket #32: Renamed Sec 3.8. to "Keep Alive Signaling",
and updated explanation.
</t><t>
Resolved Ticket #33: Explained NAT hole punching via only PPSPP
messages. 
</t><t>	
Resolved Ticket #34: Added section about limited overhead of the 
Merkle hash tree scheme. 
</t>
</list>
</t>	

<t hangText="-02        2012-04-17 Major revision">
<list style="symbols">
<t>  
Allow different chunk addressing and content integrity protection schemes (ticket #13):
</t><t>
Added chunk ID, chunk specification, chunk addressing scheme, etc. to terminology. 
</t><t>
Created new Sections 4 and 5 discussing chunk addressing and content integrity protection schemes, respectively and moved relevant sections on bin numbering and Merkle hash trees there.
</t><t>
Renamed Section 4 to "Merkle Hash Trees and The Automatic Detection of Content Size".
</t><t>
Reformulated automatic size detection in terms of nodes, not bins.
</t><t>
Extended HANDSHAKE message to carry protocol options and created Section 8 on Protocol options. VERSION and MSGTYPE_RCVD messages replaced with protocol options. 
</t><t>
Renamed HASH message to INTEGRITY.
</t><t>
Renamed HINT to REQUEST.
</t><t> 
Added description of chunk addressing via (start,end) ranges.
</t><t> 
Resolved Ticket #26: Extended "Security Considerations" with section on the 
handshake procedure.
</t><t>
Resolved Ticket #17: Defined recently as "in last 60 seconds" in PEX.
</t><t>
Resolved Ticket #20: Extended "Security Considerations" with design to make 
Peer Address Exchange more secure.
</t>
<t>
Resolved Ticket #38+39 / PPSP.SEC.REQ-2+3: Extended "Security Considerations"
with a section on confidentiality of content. 
</t>
<t>
Resolved Ticket #40+42 / PPSP.SEC.REQ-4+6: Extended "Security Considerations"
with a per-message analysis of threats and how PPSPP is protected from them.
</t>
<t>
Progressed Ticket #41 / PPSP.SEC.REQ-5: Extended "Security Considerations"
with a section on possible ways of excluding bad or broken peers from the 
system.
</t>
<t>
Moved Rationale to Appendix.
</t>
<t>
Resolved Ticket #43: Updated Live Streaming section to include "Sign All" 
content authentication, and reference to <xref target="SIGMCAST"/> following 
discussion with Fabio Picconi.
</t>
<t>
Resolved Ticket #12: Added a CANCEL message to cancel REQUESTs for the same
data that were sent to multiple peers at the same time in time-critical 
situations.
</t>
</list>
</t>

<t hangText="-03        2012-10-22 Major revision">
<list style="symbols">
<t> 
Updated Abstract and Introduction, removing download case.
</t>
<t>
Resolved Ticket #4: Added explicit CHOKE/UNCHOKE messages.
</t>
<t> 
Removed directory lists unused in streaming.
</t>
<t>Resolved Ticket #22, #23, #28: Failure behaviour, error codes and dealing
with peer crashes.
</t>
<t>
Resolved Ticket #13: Chunk ranges are the default chunk addressing scheme
that all peers MUST support.
</t>
<t>
Added a section on compatibility between chunk addressing schemes.
</t>
<t>
Expanded the explanation of Unified Merkle Trees as a method for content
integrity protection for live streams.
</t>
<t>
Added a section on forgetting chunks in live streaming. 
</t>
<t> 
Added "End" option to protocol options and corrected bugs in UDP encapsulation,
following Karl Knutsson's comments. 
</t>
<t> 
Added SHA-2 support for Merkle Hash functions.
</t>
<t>
Added content integrity protection methods for live streaming to 
the relevant protocol option.
</t>
<t>
Added a Live Signature Algorithm protocol option.
</t>
<t>
Resolved Ticket #24+27: The choice for UDP + LEDBAT as transport has now been 
reflected in the draft. TCP and RTP encapsulations have been removed.
</t>
<t>
Superfluous parts of Section 10 on extensibility have been removed.
</t>
<t>
Removed appendix with Rationale.
</t>
<t>
Resolved Ticket #21+25: PPSPP currently uses LEDBAT and the DATA and ACK
messages now contain the time fields it requires. Should other congestion
control algorithms be supported in the future, a protocol option will be
added. 
</t>
</list>
</t>


<t hangText="-04        2012-11-07 Minor revision">
<list style="symbols">
<t> 
Corrected typos.
</t>
<t> 
Added empty protocol option list when HANDSHAKE is used for explicitly
closing a channel in the UDP encapsulation. 
</t>
<t>
Corrected definition of a range chunk specification to be a single (start,end) 
pair. To send multiple disjunct ranges multiple messages should be used.
</t>
<t>
Clarified that in a range chunk specification the end is inclusive. I.e., 
[start,end] not [start,end)
</t>
<t>
Added PEX_REScert message to carry a membership certificate. Renamed PEX_RES 
to PEX_RESv4.
</t>
<t>
Added a guideline about private and link-local addresses in PEX_RES messages.
</t>
<t>
Defined the format of the public key that is used as swarm ID in live streaming.
</t>
<t>
Clarified that a HANDSHAKE message must be the first message in a datagram.
</t>
<t>
Clarified sending INTEGRITY messages ahead in a separate datagram if not
all necessary hashes that still need to be sent and the chunk fit into a 
single datagram. Defined an order for the INTEGRITY messages.
</t>
<t>
Clarified rare case of sending multiple DATA messages in one datagram.
</t>
<t>
Clarified UDP datagrams carrying PPSPP should adhere to the network's MTU to 
avoid IP fragmentation.
</t>
<t>
Defined value for version protocol option.    
</t>
<t>
Added small clarifications and corrected typos.    
</t>
<t>
Extended versioning scheme to Min/max versioning scheme defined in 
<xref target="RFC6709"/>, Section 4.1, following Riccardo Bernardini's
suggestion.    
</t>
<t>
Processed comments on unclear phrasing from Riccardo Bernardini.
</t>
<t>
Added a guideline on when to declare a peer dead.    
</t>
<t>
Made sure all essential references are listed as Normative references
following RFC3967.     
</t>

</list>
</t>

<t hangText="-05        2012-11-07 Minor revision">
<list style="symbols">
<t> 
Corrected category to Standards Track.
</t>
<t> 
Clarified that swarm identifier is a required protocol option in an initiating
HANDSHAKE in the UDP encapsulation. 
</t>

<t>
Added IANA considerations and tablised name spaces for registry definition.    
</t>

</list>
</t>


<!-- End of per-revision list -->
</list>
</t>


</section>




</back>
</rfc>
PAFTECH AB 2003-2026
2026-04-24 08:15:52