One document matched: draft-ietf-ppsp-peer-protocol-10.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
which is available here: http://xml2rfc.ietf.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
There has to be one entity for each item to be referenced.
An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3552 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3552.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs),
please see http://xml2rfc.ietf.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
(Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space
(using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std" docName="draft-ietf-ppsp-peer-protocol-10" ipr="trust200902">
<!-- category values: std, bcp, info, exp, and historic
ipr values: full3667, noModification3667, noDerivatives3667
you can add the attributes updates="NNNN" and obsoletes="NNNN"
they will automatically be output with "(if approved)" -->
<!-- ***** FRONT MATTER ***** -->
<front>
<!-- The abbreviated title is used in the page header - it is only necessary if the
full title is longer than 39 characters -->
<title abbrev="PPSP Peer Protocol">Peer-to-Peer Streaming Peer Protocol (PPSPP)</title>
<!-- add 'role="editor"' below for the editors if appropriate -->
<!-- Another author who claims to be an editor -->
<author fullname="Arno Bakker" initials="A." surname="Bakker">
<organization>Vrije Universiteit Amsterdam</organization>
<address>
<postal>
<street>De Boelelaan 1081</street>
<!-- Reorder these if your country does things differently -->
<code>1081HV</code>
<city>Amsterdam</city>
<region></region>
<country>The Netherlands</country>
</postal>
<phone></phone>
<email>arno@cs.vu.nl</email>
<!-- uri and facsimile elements may also be added -->
</address>
</author>
<author fullname="Riccardo Petrocco" initials="R." surname="Petrocco">
<organization>Technische Universiteit Delft</organization>
<address>
<postal>
<street>Mekelweg 4</street>
<!-- Reorder these if your country does things differently -->
<code>2628CD</code>
<city>Delft</city>
<region></region>
<country>The Netherlands</country>
</postal>
<phone></phone>
<email>r.petrocco@gmail.com</email>
<!-- uri and facsimile elements may also be added -->
</address>
</author>
<author fullname="Victor Grishchenko" initials="V." surname="Grishchenko">
<organization>Technische Universiteit Delft</organization>
<address>
<postal>
<street>Mekelweg 4</street>
<!-- Reorder these if your country does things differently -->
<code>2628CD</code>
<city>Delft</city>
<region></region>
<country>The Netherlands</country>
</postal>
<phone></phone>
<email>victor.grishchenko@gmail.com</email>
<!-- uri and facsimile elements may also be added -->
</address>
</author>
<date month="June" year="2014" />
<!-- If the month and year are both specified and are the current ones, xml2rfc will fill
in the current day for you. If only the current year is specified, xml2rfc will fill
in the current day and month for you. If the year is not the current one, it is
necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the
purpose of calculating the expiry date). With drafts it is normally sufficient to
specify just the year. -->
<!-- Meta-data Declarations -->
<area>Transport</area>
<workgroup>PPSP</workgroup>
<!-- WG name at the upperleft corner of the doc,
IETF is fine for individual submissions.
If this element is not present, the default is "Network Working Group",
which is used by the RFC Editor as a nod to the history of the IETF. -->
<keyword></keyword>
<!-- Keywords will be incorporated into HTML output
files in a meta tag but they have no effect on text or nroff
output. If you submit your draft to the RFC Editor, the
keywords will be used for the search engine. -->
<abstract><t>
The Peer-to-Peer Streaming Peer Protocol (PPSPP) is a protocol for
disseminating the same content to a group of interested parties in a streaming
fashion. PPSPP supports streaming of both pre-recorded (on-demand) and live
audio/video content. It is based on the peer-to-peer paradigm, where clients
consuming the content are put on equal footing with the servers initially
providing the content, to create a system where everyone can potentially
provide upload bandwidth. It has been designed to provide short time-till-playback
for the end user, and to prevent disruption of the streams by malicious peers.
PPSPP has also been designed to be flexible and extensible. It can use different
mechanisms to optimize peer uploading, prevent freeriding, and work with
different peer discovery schemes (centralized trackers or Distributed Hash
Tables). It supports multiple methods for content integrity protection and
chunk addressing. Designed as a generic protocol that can run on top of various
transport protocols, it currently runs on top of UDP using LEDBAT for
congestion control. </t>
</abstract>
</front>
<middle>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Introduction" anchor="sec_intro">
<!-- %% -->
<section title="Purpose" anchor="sec_intro_purp">
<t>
This document describes the Peer-to-Peer Streaming Peer Protocol (PPSPP),
designed for disseminating the same content to a group of interested parties in
a streaming fashion. PPSPP supports streaming of both pre-recorded (on-demand)
and live audio/video content. It is based on the peer-to-peer paradigm where
clients consuming the content are put on equal footing with the servers
initially providing the content, to create a system where everyone can
potentially provide upload bandwidth.
</t>
<t>
PPSPP has been designed to provide short time-till-playback for the end user,
and to prevent disruption of the streams by malicious peers. Central in this
design is a simple method of identifying content based on self-certification.
In particular, content in PPSPP is identified by a single cryptographic hash
that is the root hash in a Merkle hash tree calculated recursively from the
content <xref target="MERKLE"/><xref target="ABMRKL"/>. This self-certifying
hash tree allows every peer to directly detect when a malicious peer tries to
distribute fake content. The tree can be used for both static and live content.
Moreover, it ensures only a small amount of information is needed to start a
download and to verify incoming chunks of content, thus ensuring short start-up
times.
</t>
<t>
PPSPP has also been designed to be extensible for different transports and
use cases. Hence, PPSPP is a generic protocol which can run directly on top of
UDP, TCP, or other protocols. As such, PPSPP defines a common set of messages
that make up the protocol, which can have different representations on the wire
depending on the lower-level protocol used. When the lower-level transport
allows, PPSPP can also use different congestion control algorithms.
</t>
<t>
At present, PPSPP is set to run on top of UDP using LEDBAT for congestion
control <xref target="RFC6817"/>. Using LEDBAT enables PPSPP
to serve the content after playback (seeding) without disrupting the user who
may have moved to different tasks that use its network connection.
</t>
<t>
PPSPP is also flexible and extensible in the mechanisms it uses to promote
client contribution and prevent freeriding, that is, how to deal with peers that
only download content but never upload to others. It also allows different
schemes for chunk addressing and content integrity protection, if the defaults are
not fit for a particular use case. In addition, it can work with different peer
discovery schemes, such as centralized trackers or fast Distributed Hash Tables
<xref target="JIM11"/>. Finally, in this default setup, PPSPP maintains only a
small amount of state per peer. A reference implementation of PPSPP over UDP is
available <xref target="SWIFTIMPL"/>.
</t>
</section>
<!-- %% -->
<section title="Requirements Language" anchor="sec_intro_req">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
<!-- %% -->
<section title="Terminology" anchor="sec_intro_term">
<t>
<list style="hanging" hangIndent="4">
<t hangText="message"><vspace blankLines="0"/>
The basic unit of PPSPP communication. A message will have different
representations on the wire depending on the transport protocol used. Messages
are typically multiplexed into a datagram for transmission.
</t>
<t hangText="datagram"><vspace blankLines="0"/>
A sequence of messages that is offered as a unit to the underlying transport
protocol (UDP, etc.). The datagram is PPSPP's Protocol Data Unit (PDU).
</t>
<t hangText="content"><vspace blankLines="0"/>
Either a live transmission, a pre-recorded multimedia asset, or a file.
</t>
<t hangText="chunk"><vspace blankLines="0"/>
The basic unit in which the content is divided. E.g. a block of N
kilobyte.
</t>
<t hangText="chunk ID"><vspace blankLines="0"/>
Unique identifier for a chunk of content (e.g. an integer). Its type
depends on the chunk addressing scheme used.
</t>
<t hangText="chunk specification"><vspace blankLines="0"/>
An expression that denotes one or more chunk IDs.
</t>
<t hangText="chunk addressing scheme"><vspace blankLines="0"/>
Scheme for identifying chunks and expressing the chunk availability map
of a peer in a compact fashion.
</t>
<t hangText="chunk availability map"><vspace blankLines="0"/>
The set of chunks a peer has successfully downloaded and checked the integrity
of.
</t>
<t hangText="bin"><vspace blankLines="0"/>
A number denoting a specific binary interval of the content (i.e., one
or more consecutive chunks) in the bin numbers chunk addressing scheme
(see <xref target="sec_chunkaddr"/>).
</t>
<t hangText="content integrity protection scheme"><vspace blankLines="0"/>
Scheme for protecting the integrity of the content while it is being distributed
via the peer-to-peer network. I.e. methods for receiving peers to detect
whether a requested chunk has been maliciously modified by the sending peer.
</t>
<t hangText="hash"><vspace blankLines="0"/>
The result of applying a cryptographic hash function, more specifically
a modification detection code (MDC) <xref target="HAC01" />, such as SHA-1
<xref target="FIPS180-4" />, to a piece of data.
</t>
<t hangText="Merkle hash tree"><vspace blankLines="0"/>
A tree of hashes whose base is formed by the hashes of the chunks of content,
and its higher nodes are calculated by recursively computing the hash of
the concatenation of the two child hashes
(see <xref target="sec_intprot_merkle"/>).
</t>
<t hangText="root hash"><vspace blankLines="0"/>
The root in a Merkle hash tree calculated recursively from the content
(see <xref target="sec_intprot_merkle"/>).
</t>
<t hangText="swarm"><vspace blankLines="0"/>
A group of peers participating in the distribution of the same content.
</t>
<t hangText="swarm ID"><vspace blankLines="0"/>
Unique identifier for a swarm of peers, in PPSPP a sequence of bytes. When
Merkle hash trees are used for content integrity protection, the identifier
is the so-called root hash of the content (video-on-demand). For live
streaming, the swarm ID is a public key.
</t>
<t hangText="tracker"><vspace blankLines="0"/>
An entity that records the addresses of peers participating in a swarm,
usually for a set of swarms, and makes this membership information
available to other peers on request.
</t>
<t hangText="choking"><vspace blankLines="0"/>
When a peer A is choking peer B it means that A is currently not
willing to accept requests for content from B.
</t>
<t hangText="seeding"><vspace blankLines="0"/>
Peer A is said to be seeding when A has downloaded a static content asset
completely and is now offering it for others to download.
</t>
<t hangText="leeching"><vspace blankLines="0"/>
Peer A is said to be leeching when A has not completely downloaded
a static content asset yet or is not offering to upload it to others.
</t>
<t hangText="channel"><vspace blankLines="0"/>
A logical connection between two peers. The channel concept allows peers
to use the same transport address for communicating with different peers.
</t>
<t hangText="channel ID"><vspace blankLines="0"/>
Unique, randomly chosen identifier for a channel, local to each peer. So the
two peers logically connected by a channel each have a different channel ID
for the channel.
</t>
</list>
</t>
<t>
In this document the prefixes kilo, mega, etc. denote base 1024.
</t>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Overall Operation" anchor="sec_over">
<t>
The basic unit of communication in PPSPP is the message. Multiple messages are
multiplexed into a single datagram for transmission. A datagram (and hence the
messages it contains) will have different representations on the wire depending
on the transport protocol used (see <xref target="sec_encap_udp"/>).
</t>
<t>
The overall operation of PPSPP is illustrated in the following examples. The
examples assume that UDP is used for transport, the Merkle Hash Tree scheme
is used for content integrity protection, and that a specific policy is used
for selecting which chunks to download.
</t>
<!-- %% -->
<section title="Example: Joining a Swarm" anchor="sec_over_join">
<t>
Consider a user who wants to watch a video. To play the video, the user clicks
on the play button of a HTML5 <video> element shown in his PPSPP-enabled
browser. Imagine this element has a PPSPP URL (to be defined elsewhere) identifying
the video as it source. The browser passes this URL to its PPSP protocol handler.
Let's call this protocol handler peer A. Peer A parses the URL to retrieve the
transport address of a PPSP tracker and swarm ID of the content. The tracker
address may be optional in the presence of a decentralized tracking
mechanism.
</t>
<t>
Peer A now registers with the tracker following the PPSP tracker
protocol <xref target="I-D.ietf-ppsp-base-tracker-protocol"/>
and receives the IP address and
port of peers already in the swarm, say B, C, and D. Peer A now sends a
datagram containing a HANDSHAKE message to B, C, and D. This message conveys
protocol options, in particular, peer A includes the ID of the swarm as the
destination peers can listen for multiple swarms on the same transport address.
</t>
<t>
Peer B and C respond with datagrams containing a HANDSHAKE message and one or
more HAVE messages. A HAVE message conveys (part of) the chunk availability of
a peer and thus contains a chunk specification that denotes what chunks of the
content peer B, resp. C have. Peer D sends a datagram with a HANDSHAKE and
HAVE messages, but also with a CHOKE message. The latter indicates that D
is not willing to upload chunks to A at present.
</t>
</section>
<!-- %% -->
<section title="Example: Exchanging Chunks" anchor="sec_over_exch">
<t>
In response to B and C, A sends new datagrams to B and C containing REQUEST
messages. A REQUEST message indicates the chunks that a peer wants to
download, and thus contains a chunk specification. The REQUEST messages to B and C
refer to disjunct sets of chunks. B and C respond with datagrams containing
HAVE, DATA and, in this example, INTEGRITY messages. In the Merkle hash tree
content protection scheme (see <xref target="sec_intprot_merkle"/>), the
INTEGRITY messages contain all cryptographic hashes that peer A needs to verify
the integrity of the content chunk sent in the DATA message. Using these hashes
peer A verifies that the chunks received from B and C are correct. It also
updates the chunk availability of B and C using the information in the
received HAVE messages. In addition, it passes the chunks of video to the
user's browser for rendering.
</t>
<t>
After processing, A sends a datagram containing HAVE messages for the chunks
it just received to all its peers. In the datagram to B and C it includes an
ACK message acknowledging the receipt of the chunks, and adds REQUEST messages
for new chunks. ACK messages are not used when a reliable transport protocol
is used. When e.g. C finds that A obtained a chunk (from B) that C did not
yet have, C's next datagram includes a REQUEST for that chunk.
</t>
<t>
Peer D also sends HAVE messages to A when it downloads chunks from other peers.
When D is willing to accept REQUESTs from A, D sends a datagram with
an UNCHOKE message to inform A. If B or C decide to choke A they sending a CHOKE
message and A should then re-request from other peers. B and C may continue to
send HAVE, REQUEST, or periodic KEEPALIVE messages such that A keeps sending
them HAVE messages.
</t>
<t>
Once peer A has received all content (video-on-demand use case) it stops
sending messages to all other peers that have all content (a.k.a. seeders).
Peer A can also contact the tracker or another source again to obtain more
peer addresses.
</t>
</section>
<!-- %% -->
<section title="Example: Leaving a Swarm" anchor="sec_over_leave">
<t>
To leave a swarm in a graceful way, peer A sends a specific HANDSHAKE message
to all its peers (see <xref target="sec_encap_udp_HANDSHAKE"/>)
and deregisters from the tracker following the (PPSP)
tracker protocol. Peers receiving the datagram should remove A
from their current peer list. If A crashes ungracefully, peers should
remove A from their peer list when they detect it no longer sends messages
(see <xref target="sec_encap_udp_detect_dead"/>).
</t>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Messages" anchor="sec_msgs">
<t>
In general, no error codes or responses are used in the protocol; absence
of any response indicates an error. Invalid messages are discarded, and
further communication with the peer SHOULD be stopped. The rationale is that
it is sufficient to classify peers as either good (i.e., responding with chunks)
or bad and only use the good ones. This behavior allows a peer to deal with
slow, crashed and (silent) malicious peers.
</t>
<t>
Multiple messages are multiplexed into a single datagram for transmission.
Messages in a single datagram MUST be processed in the strict order in which
they appear in the datagram.
</t>
<t>
For the sake of simplicity, one swarm of peers deals with one content
asset (e.g. file) only. Retrieval of a collections of files can be done
either by using multiple swarms or by using an external storage mapping from
the linear byte space of a single swarm to different files, transparent to the
protocol.
</t>
<!-- %% -->
<section title="HANDSHAKE" anchor="sec_msgs_HANDSHAKE">
<t>
The initiating peer and the addressed peer MUST send a HANDSHAKE message as the
first message in the first datagrams they exchange. The payload of the HANDSHAKE
message is a channel ID (see <xref target="sec_msgs_channels"/>) and a sequence
of protocol options. Example options are the content
integrity protection scheme used and an option to specify the swarm identifier.
The complete set of protocol options are specified in <xref target="sec_protopt"/>.
</t>
<t>
After the handshakes are exchanged, the initiator knows that the peer really
responds. Hence, the second datagram the initiator sends MAY already contain
some heavy payload, e.g. DATA messages. To minimize the number of initialization
round-trips, the first two datagrams exchanged MAY also contain some minor
payload, e.g. HAVE messages to indicate the current progress of a peer or a
REQUEST (see <xref target="sec_msgs_REQUEST"/>), but MUST NOT include any
DATA message.
</t>
</section>
<!-- %% -->
<section title="HAVE" anchor="sec_msgs_HAVE">
<t>
The HAVE message is used to convey which chunks a peer has available for
download. The set of chunks it has available may be expressed using different
chunk addressing and availability map compression schemes, described in
<xref target="sec_chunkaddr"/>. HAVE messages can be used both for sending
a complete overview of a peer's chunk availability as well as for updates to
that set.
</t>
<t>
In particular, whenever a receiving peer P has successfully checked the integrity
of a chunk, or interval of chunks, it SHOULD send a HAVE message to all peers
Q1..Qn it wants to interact with in the near future. A policy in peer P
determines when the HAVE is sent. P may sent it directly, or peer P may wait
until either it has other data to sent to Qi, or until it has received
and checked multiple chunks. The policy will depend on how urgent it is to
distribute this information to the other peers. This urgency is generally
determined in turn by the chunk picking policy (see <xref target="sec_ext_cpa"/>).
In general, the HAVE messages can be piggybacked onto other messages.
Peers that do not receive HAVE messages are effectively prevented from
downloading the newly available chunks, hence the HAVE message can be used as a
method of choking.
</t>
<t>
The HAVE message MUST contain the chunk specification of the received and
verified chunks. A receiving peer MUST NOT send a HAVE message to peers for
which the handshake procedure is still incomplete, see <xref target="sec_sec_handshake"/>.
A peer SHOULD NOT send a HAVE message to peers that have the complete content
already (e.g. in video-on-demand scenarios).
</t>
</section>
<!-- %% -->
<section title="DATA" anchor="sec_msgs_DATA">
<t>
The DATA message is used to transfer chunks of content. The DATA message MUST
contain the chunk ID of the chunk and chunk itself. A peer MAY send the
DATA messages for multiple chunks in the same datagram. The DATA message MAY
contain additional information if needed by the specific congestion control
mechanism used. At present PPSPP uses LEDBAT <xref target="RFC6817"/>
for congestion control, which requires the current system time to be sent along
with the DATA message, so the current system time MUST be included.
</t>
</section>
<!-- %% -->
<section title="ACK" anchor="sec_msgs_ACK">
<t>
ACK messages MUST be sent to acknowledge received chunks if PPSPP is run over
an unreliable transport protocol. ACK messages MAY be sent if a reliable
transport protocol is used. In the former case, a receiving peer that has successfully
checked the integrity of a chunk, or interval of chunks C MUST send an ACK
message containing a chunk specification for C. As LEDBAT
is used, an ACK message MUST contain the one-way delay, computed from the peer's
current system time received in the DATA message. A peer MAY delay sending
ACK messages as defined in the LEDBAT specification.
</t>
</section>
<!-- %% -->
<section title="INTEGRITY" anchor="sec_msgs_INTEGRITY">
<t>
The INTEGRITY message carries information required by the receiver to verify
the integrity of a chunk. Its payload depends on the content integrity protection
scheme used. When the Merkle Hash Tree scheme is used, an INTEGRITY message
MUST contain a cryptographic hash of a subtree of the Merkle hash tree and the
chunk specification that identifies the subtree.
</t>
<t>
As a typical example, when a peer wants to send a chunk and Merkle hash trees
are used, it creates a datagram that consists of several INTEGRITY messages
containing the hashes the receiver needs to verify the chunk and the actual
chunk itself encoded in a DATA message. What are the necessary hashes and the
exact rules for encoding them into datagrams is specified in
<xref target="sec_intprot_merkle_atomic"/>, and
<xref target="sec_intprot_merkle_msg"/>, respectively.
</t>
</section>
<!-- %% -->
<section title="SIGNED_INTEGRITY" anchor="sec_msgs_SIGNED_INTEGRITY">
<t>
The SIGNED_INTEGRITY message carries digitally signed information required by
the receiver to verify the integrity of a chunk in live streaming. It logically
contains a chunk specification, a timestamp and a digital signature. Its exact
payload depends on the live content integrity protection scheme used, see
<xref target="sec_live_auth"/>.
</t>
</section>
<!-- %% -->
<section title="REQUEST" anchor="sec_msgs_REQUEST">
<t>
While bulk download protocols normally do explicit requests for certain ranges
of data (i.e., use a pull model, for example, BitTorrent
<xref target="BITTORRENT"/>), live streaming protocols quite often use a
request-less push model to save round trips. PPSPP supports both models of
operation.
</t>
<t>
The REQUEST message is used to request one or more chunks from another peer.
A REQUEST message MUST contain the specification of the chunks the requester
wants to download. A peer receiving a REQUEST message MAY send out
the requested chunks (by means of DATA messages). When peer Q receives multiple
REQUESTs from the same peer P, peer Q SHOULD process the REQUESTs in the order
received. Multiple REQUEST messages MAY be sent in one datagram, for example,
when a peer wants to request several rare chunks at once.
</t>
<t>
When live streaming via a push model, a peer receiving REQUESTs also MAY send
some other chunks in case it runs out of requests or for some other reason.
In that case the only purpose of REQUEST messages is to provide hints and
coordinate peers to avoid unnecessary data retransmission.
</t>
</section>
<!-- %% -->
<section title="CANCEL" anchor="sec_msgs_CANCEL">
<t>
When downloading on demand or live streaming content, a peer can request urgent
data from multiple peers to increase the probability of it being delivered on time.
In particular, when the specific chunk picking algorithm (see <xref target="sec_ext_cpa"/>),
detects that a request for urgent data might not be served on time, a request
for the same data can be sent to a different peer.
When a peer P decides to request urgent data from a peer Q, peer P SHOULD send a
CANCEL message to all the peers to which the data has been previously requested.
The CANCEL message contains the specification of the chunks P no longer wants
to request. In addition, when peer Q receives a HAVE message for the urgent data
from peer P, peer Q MUST also cancel the previous REQUEST(s) from P. In other
words, the HAVE message acts as an implicit CANCEL.
</t>
</section>
<!-- %% -->
<section title="CHOKE and UNCHOKE" anchor="sec_msgs_CHOKE">
<t>
Peer A can send a CHOKE message to peer B to signal it will no longer be
responding to REQUEST messages from B, for example, because A's upload capacity
is exhausted. Peer A MAY send a subsequent UNCHOKE message to signal that it
will respond to new REQUESTs from B again (A SHOULD discard old requests).
When peer B receives a CHOKE message from A it MUST NOT send new REQUEST
messages and it cannot expect answers to any outstanding ones, as the transfer
of chunks is choked. The CHOKE and UNCHOKE messages are informational as
responding to REQUESTs is OPTIONAL, see <xref target="sec_msgs_REQUEST"/>.
</t>
</section>
<!-- %% -->
<section title="Peer Address Exchange" anchor="sec_msgs_PEX">
<!-- %%% -->
<section title="PEX_REQ and PEX_RES Messages" anchor="sec_msgs_PEX_msgs">
<t>
Peer address exchange messages (or PEX messages for short) are common in
many peer-to-peer protocols. They allow peers to exchange the transport addresses
of the peers they are currently interacting with, thereby reducing the need
to contact a central tracker (or DHT) to discovery new peers. The strength
of this mechanism is therefore that it enables decentralized peer discovery:
after an initial bootstrap no central tracker is needed anymore. Its weakness
is that it enables a number of attacks, so it should not be used outside a
benign environment unless extra security measures are in place.
</t>
<t>
PPSPP supports peer-address exchange in benign and potentially hostile
environments, as an OPTIONAL feature (not mandatory to implement). The general
mechanism works as follows. To obtain some peer addresses a peer A MAY send a
PEX_REQ message to peer B. Peer B MAY respond with one or more PEX_RES messages.
PPSPP supports three types of PEX_RES reply messages, each containing the
address of a single peer Ci. The address in the PEX_RES message MUST be of
a peer B has exchanged messages with in the last 60 seconds to guarantee
liveliness. Upon receipt, peer A may contact any or none of the returned peers
Ci. Alternatively, peers MAY ignore PEX_REQ and PEX_RES messages if uninterested
in obtaining new peers or because of security considerations (rate limiting)
or any other reason. The PEX messages can be used to construct a dedicated
tracker peer.
</t>
<t>
As indicated, there are three types of PEX_RES messages: PEX_RESv4 containing
a single IPv4 address and port, PEX_RESv6 containing a single IPv6 address and
port, and a PEX_REScert message. The PEX_RESv4 and PEX_RESv6 MUST only be used
in a benign environment, as they provide no guarantees that the host addressed
actually participates in a PPSPP swarm.
</t>
<t>
To use PEX in PPSPP in a potentially hostile environment, such as the Internet,
three conditions must be met:
<list style="numbers">
<t>
Peer transport addresses must be relatively stable.
</t>
<t>
PEX_REScert messages must be used instead of PEX_RESv4 and PEX_RESv6.
</t>
<t>
A peer must not obtain all its peer addresses through PEX.
</t>
</list>
</t>
<t>The full security analysis for PEX messages can be found in
<xref target="sec_sec_pex"/>. A PEX_REScert message carries a swarm-membership
certificate rather than an IP address and port. A membership certificate for
peer C states that peer C at address (ipC,portC) is part of swarm S at time T
and is cryptographically signed by an issuer. The receiver A can check the
certificate for a valid signature by a trusted issuer, the right swarm and
liveliness and only then consider contacting C. These swarm-membership
certificates correspond to signed node descriptors in secure decentralized peer
sampling services <xref target="SPS"/>.
</t>
<t>
Several designs are possible for the security environment for these
membership certificates. That is, there are different designs possible for who
signs the membership certificates and how public keys are distributed.
<xref target="sec_sec_pex_trackerca"/> describes an example where a central
tracker acts as the Certification Authority.
</t>
<t>
In a potentially hostile environment, peers must also ensure that they
do not end up interacting only with malicious peers when using the peer-address
exchange feature. To this extent, peers MUST ensure that part of their
connections are to peers whose addresses came from a trusted and secured
tracker (see <xref target="sec_sec_pex_eclipse"/>).
</t>
<t>
Once a PPSPP implementation has obtained a list of peers (either via PEX, from
a central tracker or via a DHT), it has to determine which peers to actually
contact. In this process, a PPSPP implementation can benefit from information
by network or content providers to help improve network usage and boost
PPSPP performance. How a P2P system like PPSPP can perform these optimizations
using the ALTO protocol is described in detail in
<xref target="I-D.ietf-alto-protocol"/>, Section 7.
</t>
</section>
</section>
<!-- %% -->
<section title="Channels" anchor="sec_msgs_channels">
<t>
It is increasingly complex for peers to enable communication between each
other due to NATs and firewalls. Therefore, PPSPP uses a multiplexing scheme,
called channels, to allow multiple swarms to use the same transport address.
Channels loosely correspond to TCP connections and each channel belongs to a
single swarm, as illustrated in <xref target="fig_udp_chan"/>. As with TCP
connections, a channel is identified by a unique identifier local to the peer
at each end of the connection (cf. TCP port), which MUST be randomly chosen.
In other words, the two peers connected by a channel use different IDs to denote
the same channel. The IDs are different and random for security reasons, see
<xref target="sec_sec_handshake"/>.
</t>
<t>
In the PPSP-over-UDP encapsulation (<xref target="sec_encap_udp_channels"/>),
when a channel C has been established between peer A and peer B, the datagrams
containing messages from A to B are prefixed with the four byte channel ID
allocated by peer B, and vice versa for datagrams from B to A. The channel IDs
used are exchanged as part of the handshake procedure, see
<xref target="sec_encap_udp_HANDSHAKE"/>. In that procedure, the channel ID
with value 0 is used for the datagram that initiates the handshake.
PPSPP can be used in combination with STUN <xref target="RFC5389"/>.
</t>
<figure align="center" anchor="fig_udp_chan">
<artwork align="center"><![CDATA[
_________ _________ _________
| | | | | |
| Swarm | | Swarm | | Swarm |
| Mgr | | A | | B |
|_______| |_______| |_______|
| | / \
| | / \
____|____ ____|____ ______/__ _\_______
| | | | | | | |
| Chan | | Chan | | Chan | | Chan |
| 0 | | 481 | | 836 | | 372 |
|_______| |_______| |_______| |_______|
| | | |
| | | |
____|____________|____________|____________|____
| |
| UDP |
| port 6778 |
|______________________________________________|
]]></artwork>
<postamble>Network stack of a PPSPP peer that is reachable on UDP port 6778 and
is connected via channel 481 to one peer in swarm A and two peers in swarm B
via channels 836 and 372, respectively. Channel ID 0 is special and is used
for handshaking.</postamble>
</figure>
</section>
<!-- %% -->
<section title="Keep Alive Signalling" anchor="sec_msgs_keepalive">
<t>
A peer SHOULD send a "keep alive" message periodically to each peer it wants to
interact with in the future, but has no other messages to send them at present.
Periodically sending "keep alive" messages prevents other peers from closing the
connection after a predefined time interval of 3 minutes, as described in
<xref target="sec_encap_udp_detect_dead"/>.
PPSPP does not define an explicit message type for "keep alive" messages. In the
PPSP-over-UDP encapsulation they are implemented as simple datagrams consisting
of a 4-byte channel ID only, see <xref target="sec_encap_udp_channels"/>
and <xref target="sec_encap_udp_HANDSHAKE"/>.
</t>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Chunk Addressing Schemes" anchor="sec_chunkaddr">
<t>
PPSPP can use different methods of chunk addressing, that is, support different
ways of identifying chunks and different ways of expressing the chunk
availability map of a peer in a compact fashion.
</t>
<t>
All peers in a swarm MUST use the same chunk addressing method.
</t>
<!-- %% -->
<section title="Start-End Ranges" anchor="sec_chunkaddr_range">
<t>
A chunk specification consists of a single (start specification,end
specification) pair that identifies a range of chunks (end inclusive).
The start and end specifications can use one of multiple addressing schemes.
Two schemes are currently defined, chunk ranges and byte ranges.
</t>
<!-- %%% -->
<section title="Chunk Ranges" anchor="sec_chunkaddr_range_chunk">
<t>
The start and end specification are both chunk identifiers. Chunk identifiers
are 32-bit or 64-bit unsigned integers. A PPSPP peer
MUST support this scheme.
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Start chunk (32 or 64) ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ End chunk (32 or 64) ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %%% -->
<section title="Byte Ranges" anchor="sec_chunkaddr_range_byte">
<t>
The start and end specification are 64-bit byte offsets in the content.
The support for this scheme is OPTIONAL.
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Start byte offset (64) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| End byte offset (64) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
</section>
<!-- %% -->
<section title="Bin Numbers" anchor="sec_chunkaddr_bin">
<t>
PPSPP introduces a novel method of addressing chunks of content called "bin
numbers" (or "bins" for short). Bin numbers allow the addressing of a binary
interval of data using a single integer. This reduces the amount of state that
needs to be recorded per peer and the space needed to denote intervals on the
wire, making the protocol light-weight. In general, this numbering system allows
PPSPP to work with simpler data structures, e.g. to use arrays instead of
binary trees, thus reducing complexity. The support for this scheme is OPTIONAL.
</t>
<t>
In bin addressing, the smallest binary interval is a single chunk (e.g. a block
of bytes which may be of variable size), the largest interval is a complete
range of 2**63 chunks. In a novel addition to the classical scheme,
these intervals are numbered in a way which lays them out into a vector nicely,
which is called bin numbering, as follows. Consider an chunk interval of width
W. To derive the bin numbers of the complete interval and the subintervals, a
minimal balanced binary tree is built that is at least W chunks wide at the
base. The leaves from left-to-right correspond to the chunks 0..W-1 in the
interval, and have bin number I*2 where I is the index of the chunk (counting
beyond W-1 to balance the tree). The bin number of higher level nodes P in the
tree is calculated as follows:
</t>
<t>
<list style="hanging" hangIndent="4">
<t>
binP = (binL + binR) / 2
</t>
</list>
</t>
<t>
where binL is the bin of node P's left-hand child and binR is the bin of node
P's right-hand child. Given that each node in the tree represents a
subinterval of the original interval, each such subinterval now is
addressable by a bin number, a single integer. The bin number tree of an
interval of width W=8 looks like this:
</t>
<figure align="center" anchor="fig_chunkaddr_bin">
<artwork align="center"><![CDATA[
7
/ \
/ \
/ \
/ \
3 11
/ \ / \
/ \ / \
/ \ / \
1 5 9 13
/ \ / \ / \ / \
0 2 4 6 8 10 12 14
C0 C1 C2 C3 C4 C5 C6 C7
]]></artwork>
<postamble>The bin number tree of an interval of width W=8</postamble>
</figure>
<t>
So bin 7 represents the complete interval, bin 3 represents the interval of
chunk 0..3, bin 1 represents the interval of chunks 0 and 1, and
bin 2 represents chunk C1. The special numbers 0xFFFFFFFF (32-bit) or
0xFFFFFFFFFFFFFFFF (64-bit) stands for an empty interval, and 0x7FFF...FFF
stands for "everything".
</t>
<t>
When bin numbering is used, the ID of a chunk is its corresponding (leaf) bin
number in the tree and the chunk specification in HAVE and ACK messages is equal
to a single bin number (32-bit or 64-bit), as follows.
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Bin number (32 or 64) ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="In Messages" anchor="sec_chunkaddr_msgs">
<!-- %%% -->
<section title="In HAVE Messages" anchor="sec_chunkaddr_have">
<t>
When a receiving peer has successfully checked the integrity of a chunk or
interval of chunks it MUST send a HAVE message to all peers it wants to
interact with. The latter allows the HAVE message to be used as a method of
choking. The HAVE message MUST contain the chunk specification of the biggest
complete interval of all chunks the receiver has received and checked so far
that fully includes the interval of chunks just received. So the chunk
specification MUST denote at least the interval received, but the receiver is
supposed to aggregate and acknowledge bigger intervals, when possible.
</t>
<t>
As a result, every single chunk is acknowledged a logarithmic number of times.
That provides some necessary redundancy of acknowledgments and sufficiently
compensates for unreliable transport protocols.
</t>
<t>
Implementation note:
<list style="hanging" hangIndent="4">
<t>
To record which chunks a peer has in the state that an implementation keeps
for each peer, an implementation MAY use the efficient "binmap" data structure,
which is a hybrid of a bitmap and a binary tree, discussed in detail in
<xref target="BINMAP"/>.
</t>
</list>
</t>
</section>
<!-- %%% -->
<section title="In ACK Messages" anchor="sec_chunkaddr_ack">
<t>
PPSPP peers MUST use ACK messages to acknowledge received chunks
if an unreliable transport protocol is used. When a receiving
peer has successfully checked the integrity of a chunk or interval of chunks
C it MUST send a ACK message containing the chunk specification of its biggest,
complete interval covering C to the sending peer (see HAVE).
</t>
</section>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Content Integrity Protection" anchor="sec_intprot">
<t>
PPSPP can use different methods for protecting the integrity of the content
while it is being distributed via the peer-to-peer network. More specifically,
PPSPP can use different methods for receiving peers to detect whether a
requested chunk has been maliciously modified by the sending peer. In benign
environments, content integrity protection can be disabled.
</t>
<t>
For static content, PPSPP currently defines one method for protecting
integrity, called the Merkle Hash Tree scheme. If PPSPP operates over the
Internet, this scheme MUST be used. If PPSPP operates in a benign environment
this scheme MAY be used. So the scheme is mandatory-to-implement, to satisfy
the requirement of strong security for an IETF protocol <xref target="RFC3365"/>.
An extended version of the scheme is used to efficiently protect dynamically
generated content (live streams), as explained below and in
<xref target="sec_live_auth"/>.
</t>
<t>
The Merkle Hash Tree scheme can work with different chunk addressing schemes.
All it requires is the ability to address a range of chunks. In the following
description abstract node IDs are used to identify nodes in the tree. On the
wire these are translated to the corresponding range of chunks in the chosen
chunk addressing scheme.
</t>
<!-- %% -->
<section title="Merkle Hash Tree Scheme" anchor="sec_intprot_merkle">
<t>
PPSPP uses a method of naming content based on self-certification. In particular,
content in PPSPP is identified by a single cryptographic hash that is the
root hash in a Merkle hash tree calculated recursively from the content
<xref target="ABMRKL"/>. This self-certifying hash tree allows every peer to
directly detect when a malicious peer tries to distribute fake content. It also
ensures only a small the amount of information is needed to start a download
(the root hash and some peer addresses). For live streaming a dynamic tree
and a public key are used, see below.
</t>
<t>
The Merkle hash tree of a content asset that is divided into N chunks
is constructed as follows. Note the construction does not assume chunks
of content to be fixed size. Given a cryptographic hash function, more
specifically a modification detection code (MDC)
<xref target="HAC01"/> , such as SHA1, the hashes of all the
chunks of the content are calculated. Next, a binary tree of sufficient height
is created. Sufficient height means that the lowest level in the tree has
enough nodes to hold all chunk hashes in the set, as with bin numbering.
The figure below shows the tree for a content asset consisting of 7 chunks.
As before with the content addressing scheme, the leaves of the tree correspond
to a chunk and in this case are assigned the hash of that chunk, starting at
the left-most leaf. As the base of the tree may be wider than the number of
chunks, any remaining leaves in the tree are assigned an empty hash value of
all zeros. Finally, the hash values of the higher levels in the tree are
calculated, by concatenating the hash values of the two children (again left
to right) and computing the hash of that aggregate. If the two children are
empty hashes, the parent is an empty all zeros hash as well
(to save computation). This process ends in a hash value for the root node,
which is called the "root hash". Note the root hash only depends on the content
and any modification of the content will result in a different root hash.
</t>
<figure align="center" anchor="fig_intprot_merkle">
<artwork align="center"><![CDATA[
7 = root hash
/ \
/ \
/ \
/ \
3* 11
/ \ / \
/ \ / \
/ \ / \
1 5 9 13* = uncle hash
/ \ / \ / \ / \
0 2 4 6 8 10* 12 14
C0 C1 C2 C3 C4 C5 C6 E
=chunk index ^^ = empty hash
]]></artwork>
<postamble>The Merkle hash tree of an interval of width W=8</postamble>
</figure>
</section>
<!-- %% -->
<section title="Content Integrity Verification" anchor="sec_intprot_merkle_verify">
<t>
Assuming a peer receives the root hash of the content it wants to download
from a trusted source, it can check the integrity of any chunk of that
content it receives as follows. It first calculates the hash of the chunk
it received, for example chunk C4 in the previous figure. Along with this
chunk it MUST receive the hashes required to check the integrity of that
chunk. In principle, these are the hash of the chunk's sibling (C5) and
that of its "uncles". A chunk's uncles are the sibling Y of its parent X,
and the uncle of that Y, recursively until the root is reached. For chunk C4
its uncles are nodes 13 and 3, marked with * in the figure. Using this information
the peer recalculates the root hash of the tree, and compares it to the
root hash it received from the trusted source. If they match the chunk of
content has been positively verified to be the requested part of the content.
Otherwise, the sending peer either sent the wrong content or the wrong
sibling or uncle hashes. For simplicity, the set of sibling and uncles
hashes is collectively referred to as the "uncle hashes".
</t>
<t>
In the case of live streaming the tree of chunks grows dynamically and the root
hash is undefined or, more precisely, transient, as long as new data is
generated by the live source. <xref target="sec_live_merkle"/> defines
a method for content integrity verification for live streams that works
with such a dynamic tree. Although the tree is dynamic, content verification
works the same for both live and predefined content, resulting in a unified
method for both types of streaming.
</t>
</section>
<!-- %% -->
<section title="The Atomic Datagram Principle" anchor="sec_intprot_merkle_atomic">
<t>
As explained above, a datagram consists of a sequence of messages. Ideally,
every datagram sent must be independent of other datagrams, so each
datagram SHOULD be processed separately and a loss of one datagram must not
disrupt the flow of datagrams between two peers. Thus, as a datagram carries
zero or more messages, neither messages nor message interdependencies SHOULD
span over multiple datagrams.
</t>
<t>
This principle implies that as any chunk is verified using its uncle
hashes the necessary hashes SHOULD be put into the same datagram as the
chunk's data. If this is not possible because of a limitation on datagram
size, the necessary hashes MUST be sent first in one or more datagrams.
As a general rule, if some additional data is still missing to process a
message within a datagram, the message SHOULD be dropped.
</t>
<t>
The hashes necessary to verify a chunk are in principle its sibling's hash
and all its uncle hashes, but the set of hashes to send can be optimized.
Before sending a packet of data to the receiver, the sender inspects the
receiver's previous acknowledgments (HAVE or ACK) to derive which hashes the
receiver already has for sure. Suppose, the receiver had acknowledged chunks C0
and C1 (first two chunks of the file), then it must already have uncle hashes 5,
11 and so on. That is because those hashes are necessary to check C0 and C1
against the root hash. Then, hashes 3, 7 and so on must be also known as they
are calculated in the process of checking the uncle hash chain. Hence, to send
chunk C7, the sender needs to include just the hashes for nodes 14 and 9, which
let the data be checked against hash 11 which is already known to the receiver.
</t>
<t>
The sender MAY optimistically skip hashes which were sent out in previous,
still unacknowledged datagrams. It is an optimization trade-off between
redundant hash transmission and possibility of collateral data loss in the
case some necessary hashes were lost in the network so some delivered data
cannot be verified and thus has to be dropped. In either case, the receiver
builds the Merkle tree on-demand, incrementally, starting from the root
hash, and uses it for data validation.
</t>
<t>
In short, the sender MUST put into the datagram the missing hashes necessary
for the receiver to verify the chunk. The receiver MUST remember all the
hashes it needs to verify missing chunks that it still wants to download.
Note that the latter implies that a hardware-limited receiver MAY forget
some hashes if it does not plan to announce possession of these chunks to others
(i.e., does not plan to send HAVE messages.)
</t>
</section>
<!-- %% -->
<section title="INTEGRITY Messages" anchor="sec_intprot_merkle_msg">
<t>
Concretely, a peer that wants to send a chunk of content creates a datagram
that MUST consist of a list of INTEGRITY messages followed by a DATA message.
If the INTEGRITY messages and DATA message cannot be put into a single datagram
because of a limitation on datagram size, the INTEGRITY messages MUST be sent
first in one or more datagrams. The list of INTEGRITY messages sent MUST contain
a INTEGRITY message for each hash the receiver misses for integrity checking.
A INTEGRITY message for a hash MUST contain the chunk specification
corresponding to the node ID of the hash and the hash data itself. The chunk
specification corresponding to a node ID is defined as the range of
chunks formed by the leaves of the subtree rooted at the node. For example,
node 3 in <xref target="fig_intprot_merkle"/> denotes chunks 0,2,4,6,
so the chunk specification should denote that interval. The list of INTEGRITY
messages MUST be sorted in order of the tree height of the nodes, descending.
The DATA message MUST contain the chunk specification of the chunk and chunk
itself. A peer MAY send the required messages for multiple chunks in the same
datagram, depending on the encapsulation.
</t>
</section>
<!-- %% -->
<section title="Discussion and Overhead" anchor="sec_intprot_merkle_overhead">
<t>
The current method for protecting content integrity in BitTorrent
<xref target="BITTORRENT"/> is not suited for streaming. It involves providing
clients with the hashes of the content's chunks before the download commences
by means of metadata files (called .torrent files in BitTorrent.) However,
when chunks are small as in the current UDP encapsulation of PPSPP this implies
having to download a large number of hashes before content download can begin.
This, in turn, increases time-till-playback for end users, making
this method unsuited for streaming.
</t>
<t>
The overhead of using Merkle hash trees is limited. The size of the hash tree
expressed as the total number of nodes depends on the number of chunks the
content is divided (and hence the size of chunks) following this formula:
<list style="hanging" hangIndent="4">
<t>
nnodes = math.pow(2,math.log(nchunks,2)+1)
</t>
</list>
In principle, the hash values of all these nodes will have to be sent to a peer
once for it to verify all chunks. Hence the maximum on-the-wire overhead is
hashsize * nnodes. However, the actual number of hashes transmitted can be
optimized as described in <xref target="sec_intprot_merkle_atomic"/>.
To see a peer can verify all chunks whilst receiving not all hashes, consider
the example tree in <xref target="sec_intprot_merkle"/>.
</t>
<t>
In case of a simple progressive download, of chunks 0,2,4,6, etc. the sending
peer will send the following hashes:
</t>
<texttable anchor="tab_merkle_overhead" title="Overhead for the example tree">
<preamble></preamble>
<ttcol align="center">Chunk</ttcol>
<ttcol align="left">Node IDs of hashes sent</ttcol>
<c>0</c> <c>2,5,11</c>
<c>2</c> <c>- (receiver already knows all)</c>
<c>4</c> <c>6</c>
<c>6</c> <c>-</c>
<c>8</c> <c>10,13 (hash 3 can be calculated from 0,2,5)</c>
<c>10</c> <c>-</c>
<c>12</c> <c>14</c>
<c>14</c> <c>-</c>
<c>Total</c> <c># hashes 7</c>
<postamble></postamble>
</texttable>
<t>
So the number of hashes sent in total (7) is less than the total number
of hashes in the tree (16), as a peer does not need to send hashes that are
calculated and verified as part of earlier chunks.
</t>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Automatic Detection of Content Size" anchor="sec_intprot_autosize">
<t>
In PPSPP, the root hash of a static content asset, such as a video file,
along with some peer addresses is sufficient to start a download.
In addition, PPSPP can reliably and automatically derive the size
of such content from information received from the network when fixed
sized chunks are used. As a result, it is not necessary to include
the size of the content asset as the metadata of the content,
in addition to the root hash. Implementations of PPSPP MAY use
this automatic detection feature. Note this feature is the only feature of
PPSPP that requires that a fixed-sized chunk is used.
</t>
<!-- %% -->
<section title="Peak Hashes" anchor="sec_intprot_autosize_peak">
<t>
The ability for a newcomer peer to detect the size of the content
depends heavily on the concept of peak hashes. Peak hashes,
in general, enable two cornerstone features of PPSPP: reliable file
size detection and download/live streaming unification (see
<xref target="sec_live"/>).
The concept of peak hashes depends on the concepts of filled and
incomplete nodes. Recall that when constructing the binary trees
for content verification and addressing the base of the tree may
have more leaves than the number of chunks in the content. In the
Merkle hash tree these leaves were assigned empty all-zero
hashes to be able to calculate the higher level hashes. A filled
node is now defined as a node that corresponds to an interval
of leaves that consists only of hashes of content chunks, not
empty hashes. Reversely, an incomplete (not filled) node
corresponds to an interval that contains also empty hashes,
typically an interval that extends past the end of the file.
In the following figure nodes 7, 11, 13 and 14 are incomplete
the rest is filled.
</t>
<t>
Formally, a peak hash is the hash of a filled node in the Merkle tree,
whose sibling is an incomplete node. Practically, suppose a file is 7162 bytes
long and a chunk is 1 kilobyte. That file fits into 7 chunks, the tail chunk
being 1018 bytes long. The Merkle tree for that file is shown in
<xref target="fig_autosize_peak"/>. Following the definition the peak hashes of
this file are in nodes 3, 9 and 12, denoted with a *. E denotes an empty hash.
</t>
<figure align="center" anchor="fig_autosize_peak">
<artwork align="center"><![CDATA[
7
/ \
/ \
/ \
/ \
3* 11
/ \ / \
/ \ / \
/ \ / \
1 5 9* 13
/ \ / \ / \ / \
0 2 4 6 8 10 12* 14
C0 C1 C2 C3 C4 C5 C6 E
= 1018 bytes
]]></artwork>
<postamble>Peak hashes in a Merkle hash tree.</postamble>
</figure>
<t>
Peak hashes can be explained by the binary representation of the
number of chunks the file occupies. The binary representation for
7 is 111. Every "1" in binary representation of the file's packet
length corresponds to a peak hash. For this particular file there
are indeed three peaks, nodes 3, 9, 12. The number of peak
hashes for a file is therefore also at most logarithmic with its
size.
</t>
<t>
A peer knowing which nodes contain the peak hashes for the file
can therefore calculate the number of chunks it consists of, and
thus get an estimate of the file size (given all chunks but the last
are fixed size). Which nodes are the peaks can be securely communicated
from one (untrusted) peer A to another B by letting A send the
peak hashes and their node IDs to B. It can be shown that
the root hash that B obtained from a trusted source is sufficient
to verify that these are indeed the right peak hashes, as follows.
</t>
<t>
Lemma: Peak hashes can be checked against the root hash.
</t>
<t>
Proof: (a) Any peak hash is always the left sibling. Otherwise, be
it the right sibling, its left neighbor/sibling must also be
a filled node, because of the way chunks are laid
out in the leaves, contradiction. (b) For the rightmost
peak hash, its right sibling is zero. (c) For any peak hash,
its right sibling might be calculated using peak hashes to the
left and zeros for empty nodes. (d) Once the right sibling of
the leftmost peak hash is calculated, its parent might be
calculated. (e) Once that parent is calculated, we might
trivially get to the root hash by concatenating the hash with
zeros and hashing it repeatedly.
</t>
<t>
Informally, the Lemma might be expressed as follows: peak hashes cover all
data, so the remaining hashes are either trivial (zeros) or might be
calculated from peak hashes and zero hashes.
</t>
<t>
Finally, once peer B has obtained the number of chunks in the content it
can determine the exact file size as follows. Given that all chunks
except the last are fixed size B just needs to know the size of the last
chunk. Knowing the number of chunks B can calculate the node ID of the
last chunk and download it. As always B verifies the integrity of this
chunk against the trusted root hash. As there is only one chunk of data
that leads to a successful verification the size of this chunk must
be correct. B can then determine the exact file size as
</t>
<t>
<list style="hanging" hangIndent="4">
<t>
(number of chunks -1) * fixed chunk size + size of last chunk
</t>
</list>
</t>
</section>
<!-- %% -->
<section title="Procedure" anchor="sec_intprot_autosize_proc">
<t>
A PPSPP implementation that wants to use automatic size detection MUST
operate as follows. When a peer A sends a DATA message for the first time
to a peer B, A MUST first send all the peak hashes for the content, unless B has
already signalled earlier in the exchange that it knows the peak hashes by
having acknowledged any chunk. If they are needed, the peak hashes MUST be sent
as an extra list of uncle hashes for the chunk, before the list of actual
uncle hashes of the chunk as described in <xref target="sec_intprot_merkle_atomic"/>.
The receiver B MUST check the peak hashes against the root hash to determine
the approximate content size. To obtain the definite content size peer B MUST
download the last chunk of the content from any peer that offers it.
</t>
<t>
As an example, let's consider a 7162 bytes long file, which fits in 7 chunks of
1 kilobyte, distributed by a peer A. <xref target="fig_autosize_peak"/> shows
the relevant Merkle hash tree. A peer B which only knows the root hash of the file,
after successfully connecting to A, requests the first chunk of data, C0 in
<xref target="fig_autosize_peak"/>. Peer A replies to B by including in the datagram
the following messages in this specific order. First the three peak hashes of
this particular file, the hashes of nodes 3, 9 and 12. Second, the uncle hashes
of C0, followed by the DATA message containing the actual content of C0.
Upon receiving the peak hashes, peer B checks them against the root hash determining
that the file is 7 chunks long. To establish the exact size of the file, peer B
needs to request and retrieve the last chunk containing data, C6 in <xref target="fig_autosize_peak"/>.
Once the last chunk has been retrieved and verified, peer B concludes that it is
1018 bytes long, hence determining that the file is exactly 7162 bytes long.
</t>
</section>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Live Streaming" anchor="sec_live">
<t>
The set of messages defined above can be used for live streaming as well.
In a pull-based model, a live streaming injector can announce the chunks it
generates via HAVE messages, and peers can retrieve them via REQUEST messages.
Areas that need special attention are content authentication and chunk
addressing (to achieve an infinite stream of chunks).
</t>
<!-- %% -->
<section title="Content Authentication" anchor="sec_live_auth">
<t>
For live streaming, PPSPP supports two methods for a peer to authenticate
the content it receives from another peer, called "Sign All" and
"Unified Merkle Tree".
</t>
<t>
In the "Sign All" method, the live injector signs each chunk of content
using a private key and peers, upon receiving the chunk, check the signature
using the corresponding public key obtained from a trusted source. Support
for this method is OPTIONAL.
</t>
<t>
In the "Unified Merkle Tree" method, PPSPP combines the Merkle Hash Tree scheme
for static content with signatures to unify the video-on-demand and live
streaming scenarios. The use of Merkle hash trees reduces the number of
signing and verification operations, hence providing a similar signature
amortization to the approach described in <xref target="SIGMCAST"/>.
If PPSPP operates over the Internet, the "Unified Merkle Tree" method MUST be
used. If the protocol operates in a benign environment the method MAY be used.
So this method is mandatory-to-implement.
</t>
<t>
In both methods the swarm ID consists of a public key encoded as in a DNSSEC
DNSKEY resource record without BASE-64 encoding <xref target="RFC4034"/>.
In particular, the swarm ID consists of a 1 byte Algorithm field that
identifies the public key's cryptographic algorithm and determines the format
of the Public Key field that follows. The value of this Algorithm field is one
of the Domain Name System Security (DNSSEC) Algorithm Numbers
<xref target="IANADNSSECALGNUM"/>. The RSASHA1 <xref target="RFC4034"/>,
ECDSAP256SHA256 and ECDSAP384SHA384 <xref target="RFC6605"/> algorithms are
MANDATORY to implement.
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Algo Number(8)| ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ DNSSEC Public Key (variable) ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<!-- %%% -->
<section title="Sign All" anchor="sec_live_signall">
<t>
In the "Sign All" method, the live injector signs each chunk of content
using a private key and peers, upon receiving the chunk, check the signature
using the corresponding public key obtained from a trusted source. In particular,
in PPSPP, the swarm ID of the live stream is that public key.
</t>
<t>
A peer that wants to send a chunk of content creates a datagram that MUST
contain a SIGNED_INTEGRITY message with the chunk's signature, followed by a
DATA message with the actual chunk. If the SIGNED_INTEGRITY message and DATA
message cannot be contained into a single datagram, because of a limitation on
datagram size, the SIGNED_INTEGRITY message MUST be sent first in a separate
datagram. The SIGNED_INTEGRITY message consists of the chunk specification, the
timestamp, and the digital signature.
</t>
<t>
The digital signature algorithm which is used, is determined by the Live Signature
Algorithm protocol option, see <xref target="sec_protopt_livesigalg"/>.
The signature is computed over a concatenation of the on-the-wire
representation of the chunk specification, a 64-bit NTP timestamp
<xref target="RFC5905"/>, and the chunk, in that order.
The timestamp is the time signature that was made at the injector in UTC.
</t>
</section>
<!-- %%% -->
<section title="Unified Merkle Tree" anchor="sec_live_merkle">
<t>
In this method, the chunks of content are used as the basis for a Merkle hash
tree as for static content. However, because chunks are continuously generated,
this tree is not static, but dynamic. As a result, the tree does not have
a root hash, or more precisely has a transient root hash. A public key
therefore serves as swarm ID of the content. It is used to digitally sign
updates to the tree, allowing peers to expand it based on trusted information
using the following process.
</t>
<!-- %%%% -->
<section title="Signed Munro Hashes" anchor="sec_live_merkle_sigmunro">
<t>
The live injector generates a number of chunks, denoted NCHUNKS_PER_SIG,
corresponding to fixed power of 2 (NCHUNKS_PER_SIG>=2), which are added as
new leaves to the existing hash tree. As a result of this expansion the hash
tree contains a new subtree, that is NCHUNKS_PER_SIG chunks wide at
the base. The root of this new subtree is referred to as the munro of that
subtree, and its hash as the munro hash of the subtree, illustrated in
<xref target="fig_live_peakr2"/>. In this figure, node 5 is the new munro,
labeled with a $ sign.
</t>
<figure align="center" anchor="fig_live_peakr2">
<artwork align="center"><![CDATA[
3
/ \
/ \
/ \
1 5$
/ \ / \
0 2 4 6
]]></artwork>
<postamble>Expanded live tree. With NCHUNKS_PER_SIG=2, node 5 is the munro for
the new subtree spanning 4 and 6. Node 1 is the munro for the subtree spanning
chunks 0 and 2, created in the previous iteration.
</postamble>
</figure>
<t>
Informally, the process now proceeds as follows. The injector now signs only the
munro hash of the new subtree using its private key. Next, the injector announces
the existence of the new subtree to its peers using HAVE messages. When a peer,
in response to the HAVE messages, requests a chunk from the new subtree,
the injector first sends the signed munro hash corresponding to the
requested chunk. Afterwards, similar to static content, the injector sends the
uncle hashes necessary to verify that chunk, as in <xref target="sec_intprot_merkle"/>.
In particular, the injector sends the uncle hashes necessary to verify the requested
chunk against the munro hash. This differs from static content, where the
verification takes places against the root hash. Finally, the injector sends the
actual chunk.
</t>
<t>
The receiving peer verifies the signature on the signed munro using the
swarm ID (a public key), and updates its hash tree. As the peer now knows
the munro hash is trusted, it can verify all chunks in the subtree against
this munro hash, using the accompanying uncle hashes as in
<xref target="sec_intprot_merkle"/>.
</t>
<t>
To illustrate this procedure, lets consider the next iteration in the process.
The injector has generated the current tree shown in <xref target="fig_live_peakr2"/>
and it is connected to several peers that currently have the same tree and all
posses chunks 0, 2, 4 and 6. When the injector generates two new
chunks, NCHUNKS_PER_SIG=2, the hash tree expands as shown in <xref target="fig_live_peakr3"/>.
The two new chunks, 8 and 10, extend the tree on the right side, and to
accommodate them a new root is created, node 7. As this tree is wider at the
base than the actual number of chunks, there are currently two empty leaves.
The munro node for the new subtree is 9, labeled with a $ sign.
</t>
<figure align="center" anchor="fig_live_peakr3">
<artwork align="center"><![CDATA[
7
/ \
/ \
/ \
/ \
3 11
/ \ / \
/ \ / \
/ \ / \
1 5 9$ 13
/ \ / \ / \ / \
0 2 4 6 8 10 E E
]]></artwork>
<postamble>Expanded live tree. With NCHUNKS_PER_SIG=2, node 9 is the munro
of the newly added subtree spanning chunks 8 and 10.</postamble>
</figure>
<t>
The injector now needs to inform its peers of the updated tree, communicating
the addition of the new munro hash 9. Hence, it sends a HAVE message
with a chunk specification for nodes 8+10 to its peers. As a response, a peer P
requests the newly created chunk, e.g. chunk 8, from the injector by sending a
REQUEST message. In reply, the injector sends the signed munro hash of node 9
as an INTEGRITY message with the hash of node 9, and a SIGNED_INTEGRITY message
with the signature of the hash of node 9. These messages are followed by an
INTEGRITY message with the hash of node 10, and a DATA message with chunk 8.
</t>
<t>
Upon receipt, peer P verifies the signature of the munro and expands its view
of the tree. Next, the peer computes the hash of chunk 8 and combines it with
the received hash of node 10, computing the expected hash of node 9. He can then
verify the content of chunk 8 by computing the computed hash of node 9 with the
munro hash of the same node he just received, hence P has successfully verified
the integrity of chunk 8.
</t>
<t>
This procedure requires just one signing operation for every NCHUNKS_PER_SIG
chunks created, and one verification operation for every NCHUNKS_PER_SIG
received, making it much cheaper than "Sign All". A receiving peer does
additionally need to check one or more hashes per chunk via the Merkle Tree
scheme, but this has less hardware requirements than a signature verification
for every chunk. This approach is similar to signature amortization via Merkle
Tree Chaining <xref target="SIGMCAST"/>. The downside of scheme is in an increased
latency. A peer cannot download the new chunks until the injector has computed the
signature and announced the subtree. A peer MUST check the signature before forwarding
the chunks to other peers <xref target="POLLIVE"/>.
</t>
<t>
The number of chunks per signature NCHUNKS_PER_SIG MUST be a fixed power of 2
for simplicity. NCHUNKS_PER_SIG MUST be larger than 1 for performance reasons.
There are two related factors to consider when choosing a value for NCHUNKS_PER_SIG.
First, the allowed CPU load on clients due to signature verifications, given
the expected bitrate of the stream. To achieve a low CPU load in a high bitrate
stream, NCHUNKS_PER_SIG should be high. Second, the effect on latency,
which increases when NCHUNKS_PER_SIG gets higher, as just discussed.
Note how the procedure does not preclude the use of variable-sized chunks.
</t>
<t>
This method of integrity verification provides an additional benefit. If the system
includes some peers that saved the complete broadcast, as soon as the broadcast ends,
the content is available as a video-on-demand download using the now stabilized tree
and the final root hash as swarm identifier. Peers which saved all the chunks, can now
announce the root hash to the tracking infrastructure and instantly seed the content.
</t>
</section>
<!-- %%%% -->
<section title="Munro Signature Calculation" anchor="sec_live_merkle_sigcalc">
<t>
The digital signature algorithm used is determined by the Live Signature
Algorithm protocol option, see <xref target="sec_protopt_livesigalg"/>.
The signature is computed over a concatenation of the on-the-wire
representation of the chunk specification of the munro, a 64-bit NTP timestamp
<xref target="RFC5905"/>, and the munro hash, in that order. The timestamp is
the time signature that was made at the injector in UTC.
</t>
</section>
<!-- %%%% -->
<section title="Procedure" anchor="sec_live_merkle_proc">
<t>
Formally, the injector MUST NOT send a HAVE message for chunks in the new
subtree until it has computed the signed munro hash for that subtree.
</t>
<t>
When peer B requests a chunk C from peer A (either the injector or another peer),
and peer A decides to reply, it must do so as follows. First, peer A MUST send an
INTEGRITY message with the chunk specification for the munro of chunk C and the
munro's hash, followed by a SIGNED_INTEGRITY message with the chunk
specification for the munro, timestamp and its signature, in a single datagram,
unless B indicated earlier in the exchange that it already possess a chunk with
the same corresponding munro (by means of HAVE or ACK messages). Following
these two messages (if any), peer A MUST send the necessary missing uncles
hashes needed for verifying the chunk against its munro hash, and the chunk itself,
as described in <xref target="sec_intprot_merkle_msg"/>, sharing datagrams
if possible.
</t>
</section>
<!-- %%%% -->
<section title="Secure Tune In" anchor="sec_live_merkle_tunein">
<t>
When a peer tunes into a live stream it has to determine what is the last chunk
the injector has generated. To facilitate this process in the Unified Merkle
Tree scheme, each peer shares its knowledge about the injector's chunks with
the others by exchanging their latest signed munro hashes, as follows.
</t>
<t>
Recall that in PPSPP, when peer A initiates a channel with peer B, peer A sends
a first datagram with a HANDSHAKE message, and B responds with a second datagram
also containing a HANDSHAKE message (see <xref target="sec_msgs_HANDSHAKE"/>).
When A sends a third datagram to B, and it is received by B both peers know
that the other is listening on its stated transport address. B is then
allowed to send heavy payload like DATA messages in the fourth datagram. Peer
A can already safely do that in the third datagram.
</t>
<t>
In the Unified Merkle Tree scheme, peer A MUST send its right-most signed munro
hash to B in the third datagram, and in any subsequent datagrams to B,
until B indicates that it possess a chunk with the same corresponding
munro or a more recent munro (by means of a HAVE or ACK message). B may already
have indicated this fact by means of HAVE messages in the second datagram.
Conversely, when B sends the fourth datagram or any subsequent datagram to A,
B MUST send its right-most signed munro hash, unless A indicated knowledge of
it or more recent munros. The right-most signed munro hash of a peer is defined
as the munro hash signed by the injector of the right-most subtree of width
NCHUNKS_PER_SIG chunks in the peer's Merkle hash tree. Peer A and B MUST NOT
send the signed munro hash in the first, respectively, second datagram as it
is considered heavy payload.
</t>
<t>
When a peer receives a SIGNED_INTEGRITY message with a signed munro hash
but the timestamp is too old, the peer MUST discard the message. Otherwise
it SHOULD use the signed munro to update its hash tree and pick a tune-in point
in the live stream. A peer may use the information from multiple peers to
pick the tune-in point.
</t>
</section>
</section>
</section>
<section title="Forgetting Chunks" anchor="sec_live_disc">
<t>
As a live broadcast progresses a peer may want to discard the chunks that
it already played out. Ideally, other peers should be aware of this fact such
that they will not try to request these chunks from this peer. This could
happen in scenarios where live streams may be paused by viewers, or viewers
are allowed to start late in a live broadcast (e.g., start watching a broadcast
at 20:35 whereas it began at 20:30).
</t>
<t>
PPSPP provides a simple solution for peers to stay up-to-date with the
chunk availability of a discarding peer. A discarding peer in a live stream
MUST enable the Live Discard Window protocol option, specifying how many
chunks/bytes it caches before the last chunk/byte it advertised as being
available (see <xref target="sec_protopt_livediscwin"/>). Its peers
SHOULD apply this number as a sliding window filter over the peer's chunk
availability as conveyed via its HAVE messages.
</t>
<t>
Three factors are important when deciding for an appropriate value for this
option: the desired amount of playback buffer for peers, the bitrate of the
stream and the available resources of the peer. Consider the case of a fresh
peer joining the stream. The size of the discard window of the peers it connects
to influences how much data it can directly download to establish its prebuffer.
If the window is smaller than the desired buffer, the fresh peer has to wait
until the peers downloaded more of the stream before it can start playback. As
media buffers are generally specified in terms of a number of seconds, the size
of the discard window also related to the (average) bitrate of the stream.
Finally, if a peer has little resources to store chunks and metadata it should
chose a small discard window.
</t>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Protocol Options" anchor="sec_protopt">
<t>
The HANDSHAKE message in PPSPP can contain the following protocol options.
Unless stated otherwise,
a protocol option consists of an 8-bit code followed by an 8-bit value.
Larger values are all encoded big-endian. Each protocol option is explained
in the following subsections.
</t>
<texttable anchor="tab_id_proto_opt" title="PPSP Peer Protocol Options">
<preamble></preamble>
<ttcol align="left">Code</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c> <c>Version</c>
<c>1</c> <c>Minimum Version</c>
<c>2</c> <c>Swarm Identifier</c>
<c>3</c> <c>Content Integrity Protection Method</c>
<c>4</c> <c>Merkle Hash Tree Function</c>
<c>5</c> <c>Live Signature Algorithm</c>
<c>6</c> <c>Chunk Addressing Method</c>
<c>7</c> <c>Live Discard Window</c>
<c>8</c> <c>Supported Messages</c>
<c>9-254</c> <c>Unassigned</c>
<c>255</c> <c>End Option</c>
<postamble></postamble>
</texttable>
<!-- %% -->
<section title="End Option" anchor="sec_protopt_end">
<t>
A peer MUST conclude the list of protocol options with the end option.
Subsequent octets should be considered protocol messages.
The code for the end option is 255, and unlike others it has no value octet,
so the option's length is 1 octet.
</t>
<figure><artwork>
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1 1 1 1 1 1 1 1|
+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="Version" anchor="sec_protopt_version">
<t>
A peer MUST include the maximum version of the PPSPP protocol it supports as
the first protocol option in the list. The code for this option is 0.
Defined values are listed in <xref target="tab_id_ver_num"/>.
</t>
<texttable anchor="tab_id_ver_num" title="PPSP Peer Protocol Version Numbers">
<preamble></preamble>
<ttcol align="left">Version</ttcol>
<ttcol align="left">Description</ttcol>
<c>1</c> <c>Protocol as described in this document</c>
<c>2-255</c> <c>Unassigned</c>
<postamble></postamble>
</texttable>
<figure><artwork>
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0| Version (8) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="Minimum Version" anchor="sec_protopt_minversion">
<t>
When a peer initiates the handshake it MUST include the minimum version of the
PPSPP protocol it supports in the list of protocol options, following the
Min/max versioning scheme defined in <xref target="RFC6709"/>, Section 4.1,
strategy 5.
The code for this option is 1. Defined values are listed in
<xref target="tab_id_ver_num"/>.
</t>
<figure><artwork>
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 1| Min. Ver. (8) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="Swarm Identifier" anchor="sec_protopt_swarmid">
<t>
When a peer initiates the handshake it MUST include a single swarm identifier
option. In other cases a peer MAY include a swarm identifier option, as an
end-to-end check. This option has the following structure:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 1 0| Swarm ID Length (16) | ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Swarm Identifier (variable) ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
The Swarm ID Length field contains the length of the single Swarm Identifier
that follows in bytes. The Length field is 16 bits wide to allow for large
public keys as identifiers in live streaming. Each PPSPP peer knows the IDs of
the swarms it joins so this information can be immediately verified upon receipt.
</t>
</section>
<!-- %% -->
<section title="Content Integrity Protection Method" anchor="sec_protopt_intprot">
<t>
A peer MUST include the content integrity method used by a swarm.
The code for this option is 3. Defined values are listed
in <xref target="tab_id_cont_int_prot"/>.
</t>
<texttable anchor="tab_id_cont_int_prot" title="PPSP Peer Content Integrity Protection Methods">
<preamble></preamble>
<ttcol align="left">Method</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c> <c>No integrity protection</c>
<c>1</c> <c>Merkle Hash Tree</c>
<c>2</c> <c>Sign All</c>
<c>3</c> <c>Unified Merkle Tree</c>
<c>4-255</c> <c>Unassigned</c>
<postamble></postamble>
</texttable>
<t>
The "Merkle Hash Tree" method is the default for static content, see
<xref target="sec_intprot_merkle"/>. "Sign All", and "Unified Merkle
Tree" are for live content, see <xref target="sec_live_auth"/>,
with "Unified Merkle Tree" being the default.
</t>
<figure><artwork>
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 1 1| CIPM (8) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="Merkle Tree Hash Function" anchor="sec_protopt_mhashfunc">
<t>
When the content integrity protection method is "Merkle Hash Tree" this option
defining which hash function is used for the tree MUST be included.
The code for this option is 4. Defined values are listed in
<xref target="tab_id_merkle_func"/> (see <xref target="FIPS180-4"/> for the
function semantics).
</t>
<texttable anchor="tab_id_merkle_func" title="PPSP Peer Protocol Merkle Hash Functions">
<preamble></preamble>
<ttcol align="left">Function</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c> <c>SHA1</c>
<c>1</c> <c>SHA-224</c>
<c>2</c> <c>SHA-256</c>
<c>3</c> <c>SHA-384</c>
<c>4</c> <c>SHA-512</c>
<c>5-255</c> <c>Unassigned</c>
<postamble></postamble>
</texttable>
<t>
Implementations MUST support SHA1, see <xref target="sec_sec_hash"/>,
which is also the default.
</t>
<figure><artwork>
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 0 0| MHF (8) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="Live Signature Algorithm" anchor="sec_protopt_livesigalg">
<t>
When the content integrity protection method is "Sign All" or
"Unified Merkle Tree" this option MUST be defined. The code for this
option is 5. The 8-bit value of this option is one of the Domain Name System
Security (DNSSEC) Algorithm Numbers <xref target="IANADNSSECALGNUM"/>.
The RSASHA1 <xref target="RFC4034"/>, ECDSAP256SHA256 and ECDSAP384SHA384
<xref target="RFC6605"/> algorithms are MANDATORY to implement.
Default is ECDSAP256SHA256.
</t>
<figure><artwork>
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 0 1| LSA (8) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="Chunk Addressing Method" anchor="sec_protopt_chunkaddr">
<t>
A peer MUST include the chunk addressing method it uses. The code for this option
is 6. Defined values are listed in <xref target="tab_id_chunk_addr"/>.
</t>
<texttable anchor="tab_id_chunk_addr" title="PPSP Peer Chunk Addressing Methods">
<preamble></preamble>
<ttcol align="left">Method</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c> <c>32-bit bins</c>
<c>1</c> <c>64-bit byte ranges</c>
<c>2</c> <c>32-bit chunk ranges</c>
<c>3</c> <c>64-bit bins</c>
<c>4</c> <c>64-bit chunk ranges</c>
<c>5-255</c> <c>Unassigned</c>
<postamble></postamble>
</texttable>
<t>
Implementations MUST support "32-bit chunk ranges" and "64-bit chunk ranges".
Default is "32-bit chunk ranges".
</t>
<figure><artwork>
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 1 0| CAM (8) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="Live Discard Window" anchor="sec_protopt_livediscwin">
<t>
A peer in a live swarm MUST include the discard window it uses. The code
for this option is 7. The unit
of the discard window depends on the chunk addressing method used. For
bins and chunk ranges it is a number of chunks, for byte ranges it is a number
of bytes. Its data type is the same as for a bin, or one value in a range
specification. In other words, its value is a 32-bit or 64-bit integer in
big endian format. If this option is used, the Chunk Addressing Method
MUST appear before it in the list. This option has the following
structure:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 1 1| Live Discard Window (32 or 64) ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
A peer that does not, under normal circumstances, discard chunks MUST set
this option to the special value 0xFFFFFFFF (32-bit) or 0xFFFFFFFFFFFFFFFF
(64-bit). For example, peers that record a complete broadcast to offer it
directly as a static asset after the broadcast ends use these values (see
<xref target="sec_live_merkle"/>). <xref target="sec_live_disc"/> explains
how to determine a value for this option.
</t>
</section>
<!-- %% -->
<section title="Supported Messages" anchor="sec_protopt_supmsgs">
<t>
Peers may support just a subset of the PPSPP messages. For example, peers
running over TCP may not accept ACK messages, or peers used with a centralized
tracking infrastructure may not accept PEX messages. For these reasons, peers
who support only a proper subset of the PPSPP messages MUST signal which subset
they support by means of this protocol option. The code for this option is 8.
The value of this option is a length octet (SupMsgLen) indicating the length
in bytes of the compressed bitmap that follows.
</t>
<t>
The set of messages supported can be derived from the compressed
bitmap by padding it with bytes of value 0 until it is 256 bits in length.
Then a 1 bit in the resulting bitmap at position X (numbering left to right)
corresponds to support for message type X, see <xref target="tab_id_msg_type"/>.
In other words, to construct the compressed bitmap, create a bitmap with
a 1 for each message type supported and a 0 for a message type that is not,
store it as an array of bytes and truncate it to the last non-zero byte.
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 1 0| SupMsgLen (8) | ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Supported Messages Bitmap (variable, max 256) ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="UDP Encapsulation" anchor="sec_encap_udp">
<t>
PPSPP implementations MUST use UDP as transport protocol and MUST use LEDBAT
for congestion control <xref target="RFC6817"/>. Using LEDBAT enables PPSPP
to serve the content after playback (seeding) without disrupting the user who
may have moved to different tasks that use its network connection.
Future PPSPP versions can also run over other transport protocols, or use
different congestion control algorithms.
</t>
<!-- %% -->
<section title="Chunk Size" anchor="sec_encap_udp_chunksize">
<t>
In general, an UDP datagram containing PPSPP messages SHOULD fit inside
a single IP packet, so its maximum size depends on the MTU of the network.
If the UDP datagram does not fit, its chance of getting lost in the network
increases as the loss of a single fragment of the datagram causes the loss
of the complete datagram.
</t>
<t>
The largest message in a PPSPP datagram is the DATA message carrying a chunk
of content. So the (maximum) size of a chunk to choose for a particular swarm
depends primarily on the expected MTU. The chunk size should be chosen such that
a chunk and its required INTEGRITY messages can generally be carried inside a single
datagram, following the Atomic Datagram Principle
(<xref target="sec_intprot_merkle_atomic"/>). Other considerations are the
hardware capabilities of the peers. Having large chunks and therefore less
chunks per mebibyte of content reduces processing costs. The chunk addressing
schemes can all work with different chunk sizes, see <xref target="sec_chunkaddr"/>.
</t>
<t>
The RECOMMENDED approach is to use fixed-sized chunks of 1024 bytes, as this size
has a high likelihood of travelling end-to-end across the Internet without
any fragmentation. In particular, with this size a UDP datagram with a DATA
message can be transmitted as a single IP packet over an Ethernet network
with 1500-byte frames.
</t>
<t>
A PPSPP implementation MAY use a variant of the Packetization Layer Path MTU
Discovery (PLPMTUD), described in <xref target="RFC4821"/>, for discovering
the optimal MTU between sender and destination. As in PLPMTUD, progressively
larger probing packets are used to detect the optimal MTU among a link. However,
in PPSPP, probe packets SHOULD contain actual messages, in particular, multiple DATA
messages. By using actual DATA messages as probe packets, the returning ACK
messages will confirm the probe delivery, effectively updating the MTU estimate
on both ends of the link. To be able to scale up probe packets with sensible
increments, a minimum chunk size of 512 bytes SHOULD be used. Smaller chunk
sizes lead to an inefficient protocol. An implication is that PPSP supports
datagrams over IPv4 of 576 bytes or more only. This variant is not mandatory
to implement.
</t>
<t>
The chunk size used for a particular swarm, or that fact that it is variable
MUST be part of the swarm's metadata (which then minimally consists of the swarm
ID and the chunk nature and size).
</t>
</section>
<!-- %% -->
<section title="Datagrams and Messages" anchor="sec_encap_udp_dgram">
<t>
When using UDP, the abstract datagram described above corresponds directly
to a UDP datagram. Most messages within a datagram have a fixed length, which
generally depends on the type of the message. The first byte of a message
denotes its type. The currently defined types are:
</t>
<texttable anchor="tab_id_msg_type" title="PPSP Peer Protocol Message Types">
<preamble></preamble>
<ttcol align="left">Msg Type</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c> <c>HANDSHAKE</c>
<c>1</c> <c>DATA</c>
<c>2</c> <c>ACK</c>
<c>3</c> <c>HAVE</c>
<c>4</c> <c>INTEGRITY</c>
<c>5</c> <c>PEX_RESv4</c>
<c>6</c> <c>PEX_REQ</c>
<c>7</c> <c>SIGNED_INTEGRITY</c>
<c>8</c> <c>REQUEST</c>
<c>9</c> <c>CANCEL</c>
<c>10</c> <c>CHOKE</c>
<c>11</c> <c>UNCHOKE</c>
<c>12</c> <c>PEX_RESv6</c>
<c>13</c> <c>PEX_REScert</c>
<c>14-254</c> <c>Unassigned</c>
<c>255</c> <c>Reserved</c>
<postamble></postamble>
</texttable>
<t>
Furthermore, integers are serialized in the network (big-endian) byte order.
So consider the example of a HAVE message (<xref target="sec_msgs_HAVE"/>)
using bin chunk addressing. It has message type of 0x03 and a payload of a bin
number, a four-byte integer (say, 1); hence, its on the wire representation for
UDP can be written in hex as: "0300000001".
</t>
<t>
All messages are idempotent or recognizable as duplicates. Idempotent means that
processing a message more than once does not lead to a different state from
if it was processed just once. In particular, a
peer MAY resend DATA, ACK, HAVE, INTEGRITY, PEX_*, SIGNED_INTEGRITY, REQUEST,
CANCEL, CHOKE and UNCHOKE messages without problems when loss is suspected.
When a peer resends a HANDSHAKE message it can be recognized as duplicate by
the receiver, because it already recorded the first connection attempt,
and be dealt with.
</t>
</section>
<!-- %% -->
<section title="Channels" anchor="sec_encap_udp_channels">
<t>
As described in <xref target="sec_msgs_channels"/> PPSPP uses
a multiplexing scheme, called channels, to allow multiple swarms to use the
same UDP port. In the UDP encapsulation, each datagram from peer A to peer B
is prefixed with the channel ID allocated by peer B. The peers learn about
eachother's channel ID during the handshake as explained in a moment.
A channel ID consists of 4 bytes and MUST be generated following the
requirements in <xref target="RFC4960"/> (Sec. 5.1.3).
</t>
</section>
<!-- %% -->
<section title="HANDSHAKE" anchor="sec_encap_udp_HANDSHAKE">
<t>
A channel is established with a handshake. To start a handshake, the initiating
peer needs to know:
</t>
<t>
<list style="numbers">
<t>the IP address of a peer</t>
<t>peer's UDP port and</t>
<t>the swarm's metadata record which consists of:
<list style="format (%c)">
<t>the swarm ID of the content (see <xref target="sec_intprot_merkle"/>
and <xref target="sec_live"/>),</t>
<t>the chunk size used,</t>
<t>the chunk addressing method used,</t>
<t>the content integrity protection method used, and</t>
<t>the Merkle hash tree function used (if applicable).</t>
<t>If automatic content size detection (see <xref target="sec_intprot_autosize"/>)
is not used, the content length is also part of the metadata record for
static content.
</t>
</list>
</t>
</list>
</t>
<t>
A datagram containing a HANDSHAKE message:
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Channel ID (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0| Source Channel ID (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Protocol Options ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
where:
</artwork></figure>
<list style="hanging">
<t>Destination Channel ID:
<list style="hanging">
<t>If the message is sent by the initiating peer than it MUST be an all 0-zeros channel ID.</t>
<t>If the message sent by the receiving peer than it MUST consist of the Source Channel ID from the sender's HANDSHAKE message</t>
</list>
</t>
<t>The octect 0x00: The HANDSHAKE message: 0x00</t>
<t>The Source Channel ID: A locally unused channel ID</t>
<t>Protocol Options: A list of protocol options encoding the swarm's metadata,
as defined in <xref target="sec_protopt"/>.</t>
</list>
</t>
<t>
A peer SHOULD explicitly close a channel by sending a HANDSHAKE message that
MUST contain an all 0-zeros channel ID and a list of protocol options. The
list MUST be either empty or contain the maximum version number the sender
supports, following the Min/max versioning scheme defined in
<xref target="RFC6709"/>, Section 4.1.
</t>
</section>
<!-- %% -->
<section title="HAVE" anchor="sec_encap_udp_HAVE">
<t>
A HAVE message (type 0x03) consists of a single chunk specification that states
that the sending peer has those chunks and successfully checked their integrity.
The single chunk specification represents a consecutive range of verified chunks.
A bin consists of a single integer, and a chunk or byte range of two integers, of
the width specified by the Chunk Addressing protocol options, encoded big endian.
</t>
<t>
A HAVE message using 32-bit chunk ranges as Chunk Addressing method:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 1 1| Start chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | End chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the HAVE message (0x03), followed by the start chunk and
the end chunk describing the chunk range.
</t>
<t>
(received and checked first four kilobytes of a file/stream)
</t>
</section>
<!-- %% -->
<section title="DATA" anchor="sec_encap_udp_DATA">
<t>
A DATA message (type 0x01) consists of a chunk specification, a timestamp and
the actual chunk. In case a datagram contains one DATA message, a sender
MUST always put the DATA message in the tail of the datagram. A datagram MAY
contain multiple DATA messages when the chunk size is fixed and when none
of DATA messages carry the last chunk if that is smaller than the chunk size.
As the LEDBAT congestion control is used, a
sender MUST include a timestamp, in particular, a 64-bit integer representing
the current system time with microsecond accuracy. The timestamp MUST be
included between chunk specification and the actual chunk.
</t>
<t>
A DATA message using 32-bit chunk ranges as Chunk Addressing method:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 1| Start chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | End chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp (64) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Data ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the DATA message (0x01), followed by the start chunk and
the end chunk describing the chunk range, the timestamp and the actual data.
</t>
</section>
<!-- %% -->
<section title="ACK" anchor="sec_encap_udp_ACK">
<t>
An ACK message (type 0x02) acknowledges data that was received from
its addressee; to comply with the LEDBAT delay-based congestion control
an ACK message consists of a chunk specification and a timestamp representing
an one-way delay sample. The one-way delay sample is a 64-bit integer with
microsecond accuracy, and is computed from the timestamp received from the
previous DATA message containing the chunk being acknowledged following the
LEDBAT specification.
</t>
<t>
An ACK message using 32-bit chunk ranges as Chunk Addressing method:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 1 0| Start chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | End chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| One-way delay sample (64) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the ACK message (0x02), followed by the start chunk and
the end chunk describing the chunk range, and the one-way delay sample.
</t>
</section>
<!-- %% -->
<section title="INTEGRITY" anchor="sec_encap_udp_INTEGRITY">
<t>
An INTEGRITY message (type 0x04) consists of a chunk specification and
the cryptographic hash for the specified chunk or node. The type and format
of the hash depends on the protocol options.
</t>
<t>
An INTEGRITY message using 32-bit chunk ranges as Chunk Addressing method and
SHA1 hash:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 0 0| Start chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | End chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Hash (160) ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the ACK message (0x02), followed by the start chunk and
the end chunk describing the chunk range, and the one-way delay sample.
</t>
</section>
<!-- %% -->
<section title="SIGNED_INTEGRITY" anchor="sec_encap_udp_SIGNED_INTEGRITY">
<t>
A SIGNED_INTEGRITY message (type 0x07) consists of a chunk specification,
a 64-bit NTP timestamp <xref target="RFC5905"/> and a digital signature
encoded as a Signature field in a RRSIG record in DNSSEC without the BASE-64
encoding <xref target="RFC4034"/>. The signature algorithm is defined by the
Live Signature Algorithm protocol option, see
<xref target="sec_protopt_livesigalg"/>. The plaintext over which the
signature is taken depends on the content integrity protection method used,
see <xref target="sec_live_auth"/>.
</t>
<t>
A SIGNED_INTEGRITY message using 32-bit chunk ranges as Chunk Addressing method:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 1 1| Start chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | End chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp (64) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Signature ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the SIGNED_INTEGRITY message (0x07), followed by the start
chunk and the end chunk describing the chunk range, the timestamp, and the Signature.
</t>
<t>
The length of the digital signature can be derived from the Live Signature
Algorithm protocol option and the swarm ID as follows. The first MANDATORY algorithm
is RSASHA1. In that case, the swarm ID consists of a 1-byte Algorithm field
followed by a RSA public key stored as a tuple (exponent length,exponent,modulus)
<xref target="RFC3110"/>. Given the exponent length and the length of the
public key tuple in the swarm ID, the length of the modulus in bytes can be
calculated. This yields the length of the signature as in RSA this is the
length of the modulus <xref target="HAC01"/>. The other MANDATORY algorithms
are ECDSAP256SHA256 and ECDSAP384SHA384 <xref target="RFC6605"/>. For these
algorithms the length of the digital signature is 64 and 96 bytes,
respectively.
</t>
</section>
<!-- %% -->
<section title="REQUEST" anchor="sec_encap_udp_REQUEST">
<t>
A REQUEST message (type 0x08) consists of a chunk specification for the
chunks the requester wants to download.
</t>
<t>
A REQUEST message using 32-bit chunk ranges as Chunk Addressing method:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 1 0 0 0| Start chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | End chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the REQUEST message (0x08), followed by the start chunk and
the end chunk describing the chunk range.
</t>
</section>
<!-- %% -->
<section title="CANCEL" anchor="sec_encap_udp_CANCEL">
<t>
A CANCEL message (type 0x09) consists of a chunk specification for the
chunks the requester no longer is interested in.
</t>
<t>
A CANCEL message using 32-bit chunk ranges as Chunk Addressing method:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 1 0 0 1| Start chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | End chunk (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the CANCEL message (0x09), followed by the start chunk and
the end chunk describing the chunk range.
</t>
</section>
<!-- %% -->
<section title="CHOKE and UNCHOKE" anchor="sec_encap_udp_CHOKE">
<t>
Both CHOKE and UNCHOKE messages (types 0x0a and 0x0b, respectively) carry no
payload.
</t>
<t>
A CHOKE message:
</t>
<figure><artwork>
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|0 0 0 0 1 0 1 0|
+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the CHOKE message (0x0a).
</t>
<t>
An UNCHOKE message:
</t>
<figure><artwork>
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|0 0 0 0 1 0 1 1|
+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the UNCHOKE message (0x0b).
</t>
</section>
<!-- %% -->
<section title="PEX_REQ, PEX_RESv4, PEX_RESv6 and PEX_REScert" anchor="sec_encap_udp_PEX">
<t>
A PEX_REQ (0x06) message has no payload. A PEX_RES (0x05) message consists
of an IPv4 address in big endian format followed by a UDP port number
in big endian format. A PEX_RESv6 (0x0c) message contains a 128-bit IPv6
address instead of an IPv4 one. If a PEX_REQ message does not originate from
a private or link-local address <xref target="RFC1918"/><xref target="RFC4291"/>,
then the PEX_RES* messages sent in reply MUST NOT contain such addresses. This
is to prevent leaking of internal addresses to external peers.
</t>
<t>
A PEX_REQ message:
</t>
<figure><artwork>
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 1 0|
+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the PEX_REQ message (0x06).
</t>
<t>
A PEX_RES message:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 0 1| IPv4 Address (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Port (16) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the PEX_RES message (0x05), followed by the IPv4
address and the port number.
</t>
<t>
A PEX_RESv6 message:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 1 1 0 0| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IPv6 Address (128) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Port (16) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the PEX_RESv6 message (0x0c), followed by the IPv6
address and the port number.
</t>
<t>
A PEX_REScert (0x0d) message consists of a 16-bit integer in big endian
specifying the size of the membership certificate that follows, see
<xref target="sec_sec_pex_ampl"/>. This membership certificate states that peer
P at time T is a member of swarm S and is a X.509v3 certificate
<xref target="RFC5280"/> that is encoded using the ASN.1 distinguished encoding
rules (DER) <xref target="CCITT.X208.1988"/>. The certificate MUST contain a
"Subject Alternative Name" extension, marked as critical, of type
uniformResourceIdentifier.
</t>
<t>
A PEX_REScert message:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 1 1 0 1| Size of Memb. Cert. (16) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Membership Certificate ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
where the first octet is the PEX_REScert message (0x0d), followed by the size of
the membership certificate, and the membership certificate.
</t>
<t>
The URL contained in the name extension MUST follow the generic syntax for URLs
<xref target="RFC3986"/>, where its scheme component is "ppsp", the host in
the authority component is the DNS name or IP address of peer P, the port in
the authority component is the port of peer P, and the path contains the swarm
identifier for swarm S, in hexadecimal form. In particular, the preferred form
of the swarm identifier is xxyyzz..., where the 'x's, 'y's and 'z's are 2
hexadecimal digits of the 8-bit pieces of the identifier. The validity time of
the certificate is set with notBefore UTCTime set to T and notAfter UTCTime set
to T plus some expiry time defined by the issuer. An example URL:
<list style="hanging" hangIndent="4">
<t>
ppsp://192.0.2.0:6778/e5a12c7ad2d8fab33c699d1e198d66f79fa610c3
</t>
</list>
</t>
</section>
<!-- %% -->
<section title="KEEPALIVE" anchor="sec_encap_udp_KEEPALIVE">
<t>
Keepalives do not have a message type on UDP. They are just simple
datagrams consisting of the 4-byte channel ID of the destination only.
</t>
<t>
A keepalive datagram:
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Channel ID (32) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
<!-- %% -->
<section title="Detecting a Dead Peer" anchor="sec_encap_udp_detect_dead">
<t>
A guideline for declaring a peer dead consist of a 3 minute delay since
that last packet has been received from that peer, and at least 3 datagrams
were sent to that peer during the same period.
</t>
</section>
<!-- %% -->
<section title="Flow and Congestion Control" anchor="sec_encap_udp_control">
<t>
Explicit flow control is not necessary in PPSPP-over-UDP. In the case of
video-on-demand the receiver will request data explicitly from peers and is
therefore in control of how much data is coming towards it. In the case of
live streaming, where a push-model may be used, the amount of data incoming
is limited to the bitrate, which the receiver must be able to process otherwise
it cannot play the stream. Should, for any reason, the receiver get saturated
with data that situation is perfectly detected by the congestion control.
</t>
<t>
PPSPP-over-UDP can support different congestion control algorithms.
At present, it uses the LEDBAT congestion control algorithm <xref target="RFC6817"/>.
LEDBAT is a delay-based congestion control algorithm that is used by millions
of users everyday as part of the uTP transmission protocol of BitTorrent
<xref target="LBT"/>,<xref target="LCOMPL"/> and is suitable for P2P streaming
<xref target="PPSPPERF"/>.
</t>
</section>
<!-- %% -->
<section title="Example of Operation" anchor="sec_encap_udp_example">
<t>
We present a small example of communication between a leecher and a seeder.
The example presents the transmission of the file "Hello World!", which fits
within a 1024 byte chunk. For an easy understanding we use the message description
names, as listed in <xref target="tab_id_msg_type"/>, and the protocol option names
as listed in <xref target="tab_id_proto_opt"/>, rather than the actual binary value.
</t>
<t>
To do the handshake the initiating peer sends a datagram that MUST start
with an all 0-zeros channel ID (0x00000000), followed by a HANDSHAKE message, whose
payload is a locally unused channel ID (0x00000001) and a list of protocol options.
</t>
<t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| HANDSHAKE |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 1| Version |0 0 0 0 0 0 0 1| Min Version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 1| Swarm ID |0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 1 0|
~ ..... ~
|1 0 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Cont. Int. |0 0 0 0 0 0 0 1| Mer.H.Tree F. |0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Chunk Add. |0 0 0 0 0 0 1 0| End |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</t>
<t>
The protocol options are:
<list style="hanging">
<t>Version: 1</t>
<t>Minimum supported Version: 1</t>
<t>Swarm Identifier: A 20 bytes root hash (47a0...b03b) identifying the content.</t>
<t>Content Integrity Protection Method: Merkle Hash Tree.</t>
<t>Merkle Tree Hash Function: SHA1.</t>
<t>Chunk Addressing Method: 32-bit chunk ranges.</t>
</list>
</t>
<t>
The receiving peer MAY respond, in which case the returned datagram MUST consist
of the channel ID from the sender's HANDSHAKE message (0x00000001), a HANDSHAKE message,
whose payload is a locally unused channel ID (0x00000008) and a list of protocol options,
followed by any other messages it wants to send.
</t>
<t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| HANDSHAKE |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 1 0 0 0| Version |0 0 0 0 0 0 0 1| Cont. Int. |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 1| Mer.H.Tree F. |0 0 0 0 0 0 0 0| Chunk Add. |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 1 0| End | HAVE |0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</t>
<t>
With the protocol options the receiving peer agrees on speaking protocol version 1,
on using the Merkle Hash Tree as Content Integrity Protection Method, SHA1 hash as
Merkle Tree Hash Function, and 32-bit chunk ranges as Chunk Addressing Method.
Furthermore, it sends a HAVE message within the same datagram, announcing that it has
locally available the first chunk of content.
</t>
<t>
At this point, the initiator knows that the peer really responds; for that
purpose channel IDs MUST be random enough to prevent easy guessing. So, the
third datagram of a handshake MAY already contain some heavy payload. To
minimize the number of initialization round trips, the first two datagrams
MAY also contain some minor payload, e.g. the HAVE message.
</t>
<t>
The initiating peer MAY send a request for
the chunks of content it wants to retrieve from the receiving peer, e.g. the first chunk
announced during the handshake. It always precedes the message with the channel ID of the
peer it is communicating with (e.g. 0x00000008 in our example), as described in <xref target="sec_msgs_channels"/>.
Furthermore, it MAY add additional messages such as a PEX_REQ.
</t>
<t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| REQUEST |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0| PEX_REQ |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</t>
<t>
When receiving the third datagram, both peers have the proof they really talk to each other; the
three-way handshake is complete. The receiving peer responds to the request by sending a DATA
message containing the requested content.
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| DATA |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 0 0 1 0 0|0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 0 1 1 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ ..... ~
|0 1 1 0 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
The DATA message consists of:
<list style="hanging">
<t>The 32-bit chunk range: 0,0 (the first chunk).</t>
<t>The timestamp value: 0004e94180b7db44</t>
<t>The Data message: 48656c6c6f20776f726c6421 (the "Hello world!" file)</t>
</list>
</t>
<t>
Note that the above datagram does not include the INTEGRITY message, as the entire
content can fit into a single message, hence the initiating peer is able to
verify it against the root hash. Also, in this example the peer does not respond to the
PEX_REQ as it does not know any third peer participating in the swarm.
</t>
<t>
Upon receiving the requested data, the initiating peer responds with an acknowledgement
message for the first chunk, containing a one way delay sample (100ms). Furthermore it
also adds a HAVE message for the chunk.
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ACK |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 1 0 0 1 0 0| HAVE |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
<t>
At this point the initiating peer has successfully retrieved the entire file.
It then explicitly closes the connection by sending a HANDSHAKE message that
contains an all 0-zeros channel ID.
</t>
<figure><artwork>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| HANDSHAKE |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 0 0 0| End |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork></figure>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Extensibility" anchor="sec_ext">
<!-- %% -->
<section title="Chunk Picking Algorithms" anchor="sec_ext_cpa">
<t>
Chunk (or piece) picking entirely depends on the receiving peer. The sender
peer is made aware of preferred chunks by the means of REQUEST messages. In some
(live) scenarios it may be beneficial to allow the sender to ignore those hints
and send unrequested data.
</t>
<t>
The chunk picking algorithm is external to the PPSPP protocol and will generally
be a pluggable policy that uses the mechanisms provided by PPSPP.
The algorithm will handle the choices made by the user consuming the content,
such as seeking, switching audio tracks or subtitles. Example policies
for P2P streaming can be found in <xref target="BITOS"/>,
and <xref target="EPLIVEPERF"/>.
</t>
</section>
<!-- %% -->
<section title="Reciprocity Algorithms" anchor="sec_ext_ra">
<t>
The role of reciprocity algorithms in peer-to-peer systems is to promote
client contribution and prevent freeriding. A peer is said to be freeriding
if it only downloads content but never uploads to others. Examples of
reciprocity algorithms are tit-for-tat as used in BitTorrent <xref target="TIT4TAT"/>
and Give-to-Get <xref target="GIVE2GET"/>. In PPSPP, reciprocity enforcement
is the sole responsibility of the sender peer.
</t>
</section>
</section>
<!-- %% -->
<section title="Acknowledgements" anchor="sec_ack">
<t>
Arno Bakker, Riccardo Petrocco and Victor Grishchenko are partially supported
by the P2P-Next project (http://www.p2p-next.org/), a research project
supported by the European Community under its 7th Framework Programme
(grant agreement no. 216217). The views and conclusions contained
herein are those of the authors and should not be interpreted as
necessarily representing the official policies or endorsements,
either expressed or implied, of the P2P-Next project or the European
Commission.
</t>
<t>
The PPSPP protocol was designed by Victor Grishchenko at Technische Universiteit
Delft. The authors would like to thank the following people for their
contributions to this draft: the chairs (Martin Stiemerling, Yunfei Zhang,
Stefano Previdi, Ning Zong) and members of the IETF PPSP working
group, and Mihai Capota, Raul Jimenez, Flutra Osmani, Johan Pouwelse,
and Raynor Vliegendhart.
</t>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="IANA Considerations" anchor="sec_iana">
<t>IANA is to create the new registries defined below for the extensibility of
the protocol. For all registries, assignments consist of a name and its
associated value. Also for all registries, the "Unassigned" ranges designated
are governed by the policy 'IETF Review' as described in <xref target="RFC5226"/>.
</t>
<section title="PPSP Peer Protocol Message Type Registry" anchor="sec_iana_reg_msg">
<t>
Registry name is "PPSP Peer Protocol Message Type Registry". Values are
integers in the range 0-255, with initial assignments and reservations given
in <xref target="tab_id_msg_type"/>.
</t>
</section>
<section title="PPSP Peer Protocol Option Registry" anchor="sec_iana_reg_option">
<t>
Registry name is "PPSP Peer Protocol Option Registry". Values are integers in
the range 0-255, with initial assignments and reservations given in
<xref target="tab_id_proto_opt"/>.
</t>
</section>
<section title="PPSP Peer Protocol Version Number Registry" anchor="sec_iana_reg_version">
<t>
Registry name is "PPSP Peer Protocol Version Number Registry". Values are
integers in the range 0-255, with initial assignments and reservations
given in <xref target="tab_id_ver_num"/>.
</t>
</section>
<section title="PPSP Peer Protocol Content Integrity Protection Method Registry" anchor="sec_iana_reg_cipm">
<t>
Registry name is "PPSP Peer Protocol Content Integrity Protection Method Registry".
Values are integers in the range 0-255, with initial assignments and reservations
given in <xref target="tab_id_cont_int_prot"/>.
</t>
</section>
<section title="PPSP Peer Protocol Merkle Hash Tree Function Registry" anchor="sec_iana_reg_merkle_func">
<t>
Registry name is "PPSP Peer Protocol Merkle Hash Tree Function Registry".
Values are integers in the range 0-255, with initial assignments and reservations
given in <xref target="tab_id_merkle_func"/>.
</t>
</section>
<section title="PPSP Peer Protocol Chunk Addressing Method Registry" anchor="sec_iana_reg_cam">
<t>
Registry name is "PPSP Peer Protocol Chunk Addressing Method Registry". Values
are integers in the range 0-255, with initial assignments and reservations
given in <xref target="tab_id_chunk_addr"/>.</t>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Manageability Considerations" anchor="sec_mgmt">
<t>
This section presents operations and management considerations following
the checklist in <xref target="RFC5706"/>, Appendix A.
</t>
<t>
In this section "PPSPP client" is defined as a PPSPP peer acting on behalf
of an end user which may not yet have a copy of the content, and "PPSPP server"
as a PPSPP peer that provides the initial copies of the content to the swarm on
behalf of a content provider.
</t>
<!-- %% -->
<section title="Operations" anchor="sec_mgmt_op">
<!-- %%% -->
<section title="Installation and Initial Setup" anchor="sec_mgmt_op_install">
<t>
A content provider wishing to use PPSPP to distribute content should setup
at least one PPSPP server. PPSPP servers need to have access to either some
static content or to some live audio/video sources. To provide flexibility for
implementors, this configuration process is not standardized. The output of
this process will be a list of metadata records, one for each swarm.
A metadata record consists of the swarm ID, the chunk size used, the chunk
addressing method used, the content integrity protection method used,
and the Merkle hash tree function used (if applicable). If automatic content
size detection (see <xref target="sec_intprot_autosize"/>) is not used, the
content length is also part of the metadata record for static content. Note
the swarm ID already contains the Live Signature Algorithm used, in case of a
live stream.
</t>
<t>In addition, a content provider should setup a tracking facility for the
content by configuring, for example, a PPSP tracker <xref target="I-D.ietf-ppsp-base-tracker-protocol"/>
or a Distributed Hash Table. The output of the latter process is a list of
transport addresses for the tracking facility.
</t>
<t>
The list of metadata records of available content, and transport address for the
tracking facility, can be distributed to users in various ways. Typically,
they will be published on a Web site as links. When a user clicks such a link
the PPSPP client is launched, either as a standalone application or by invoking
the browser's internal PPSPP protocol handler, as exemplified in
<xref target="sec_over"/>. The clients use the tracking facility to obtain the
transport address of the PPSPP server(s) and other peers from the swarm,
executing the peer protocol to retrieve and redistribute the content. The format
of the PPSPP URLs should be defined in an extension document. The default
protocol options should be exploited to keep the URLs small.
</t>
<t>
The minimal information a tracking facility must return when queried for a list
of peers for a swarm is as follows. Assuming the communication between tracking
facility and requestor is protected, the facility must at least return for
each peer in the list its IP address, transport protocol identifier (i.e., UDP),
and transport protocol port number.
</t>
</section>
<!-- %%% -->
<section title="Requirements on Other Protocols and Functional Components" anchor="sec_mgmt_op_req_other">
<t>
When using the PPSP tracker protocol, PPSPP requires a specific behavior from
this protocol for security reasons, as detailed in <xref target="sec_sec_pex"/>.
</t>
</section>
<!-- %%% -->
<section title="Migration Path" anchor="sec_mgmt_op_migrate">
<t>
This document does not detail a migration path since there is no previous
standard protocol providing similar functionality.
</t>
</section>
<!-- %%% -->
<section title="Impact on Network Operation" anchor="sec_mgmt_op_impact_netw">
<t>
PPSPP is a peer-to-peer protocol that takes advantage of the fact that content
is available from multiple sources to improve robustness, scalability and performance. At the
same time, poor choices in determining which exact sources to use can lead to
bad experience for the end user and high costs for network operators. Hence,
PPSPP can benefit from the ALTO protocol to steer peer selection, as described
in <xref target="sec_msgs_PEX_msgs"/>.
</t>
</section>
<!-- %%% -->
<section title="Verifying Correct Operation" anchor="sec_mgmt_op_ver_correct">
<t>
PPSPP is operating correctly when all peers obtain the desired content on time.
Therefore the PPSPP client is the ideal location to verify the protocol's
correct operation. However, it is not feasible to mandate logging the
behavior of PPSPP peers in all implementations and deployments, for example,
due to privacy reasons. There are two alternative options:
</t>
<t>
<list style="symbols">
<t>
Monitoring the PPSPP servers initially providing the content, using
standard metrics such as bandwidth usage, peer connections and activity,
can help identify trouble, see next section and
<xref target="RFC2564"/>.
</t>
<t>
The PPSP tracker protocol may be used to gather information about all
peers in a swarm, to obtain a global view of operation, according to
<xref target="RFC6972"/> (requirement PPSP-TP-REQ-3).
</t>
</list>
</t>
<t>
Basic operation of the protocol can be easily verified when a tracker
and swarm ID are known by starting a PPSPP download. Deep packet inspection for
DATA and ACK messages help to establish that actual content transfer is
happening and that the chunk availability signaling and integrity checking are
working.
</t>
</section>
<!-- %%% -->
<section title="Configuration" anchor="sec_mgmt_op_config">
<t>
<xref target="tab_summ_defaults"/> shows the PPSPP parameters, their
defaults and where the parameter is defined. For parameters that have no
default, the table row contains the word "var" and refers to the section
discussing the considerations to make when choosing a value.
</t>
<texttable anchor="tab_summ_defaults" title="PPSPP Defaults">
<preamble></preamble>
<ttcol align="left">Name</ttcol>
<ttcol align="left">Default</ttcol>
<ttcol align="left">Definition</ttcol>
<c>Chunk Size</c> <c>var, 1024 bytes recommended</c> <c><xref target="sec_encap_udp_chunksize"/></c>
<c>Static Content Integrity Protection Method</c> <c>1 (Merkle Hash Tree)</c> <c><xref target="sec_protopt_intprot"/></c>
<c>Live Content Integrity Protection Method</c> <c>3 (Unified Merkle Tree)</c> <c><xref target="sec_protopt_intprot"/></c>
<c>Merkle Hash Tree Function</c> <c>0 (SHA1)</c> <c><xref target="sec_protopt_mhashfunc"/></c>
<c>Live Signature Algorithm</c> <c>13 (ECDSAP256SHA256)</c> <c><xref target="sec_protopt_livesigalg"/></c>
<c>Chunk Addressing Method</c> <c>2 (32-bit chunk ranges)</c> <c><xref target="sec_protopt_chunkaddr"/></c>
<c>Live Discard Window</c> <c>var</c> <c><xref target="sec_live_disc"/>, <xref target="sec_protopt_livediscwin"/></c>
<c>NCHUNKS_PER_SIG</c> <c>var</c> <c><xref target="sec_live_merkle_sigmunro"/></c>
<c>Dead peer detection</c> <c>No reply in 3 minutes + 3 datagrams</c> <c><xref target="sec_encap_udp_detect_dead"/></c>
<postamble></postamble>
</texttable>
</section>
</section>
<!-- %% -->
<section title="Management Considerations" anchor="sec_mgmt_mgmt">
<t>
The management considerations for PPSPP are very similar to other protocols
that are used for large-scale content distribution, in particular HTTP.
How does one manage large numbers of servers? How does one push new content out
to a server farm and allows staged releases? How to detect faults and how to measure
servers and end-user performance? As standard solutions to these challenges are
still being developed, this section cannot provide a definitive recommendation
on how PPSPP should be managed. Hence, it describes the standard solutions
available at this time, and assumes a future extension document will provide more
complete guidelines.
</t>
<!-- %%% -->
<section title="Management Interoperability and Information" anchor="sec_mgmt_mgmt_interop">
<t>
As just stated, PPSPP servers providing initial copies of the content are akin
to WWW and FTP servers. They can also be deployed in large numbers and thus
can benefit from standard management facilities. PPSPP servers may therefore
implement an SNMP management interface based on the APPLICATION-MIB
<xref target="RFC2564"/>, where the file object can be used to report on swarms.
</t>
<t>
What is missing is the ability to remove or rate limit specific PPSPP swarms on
a server. This corresponds to removing or limit specific virtual servers on a
Web server. In other words, as multiple pieces of content (swarms, virtual
WWW servers) are multiplexed onto a single server process, more fine-grained
management of that process is required. This functionality is currently missing.
</t>
<t>
Logging is an important functionality for PPSPP servers and, depending
on the deployment, PPSPP clients. Logging should be done via syslog
<xref target="RFC5424"/>.
</t>
</section>
<!-- %%% -->
<section title="Fault Management" anchor="sec_mgmt_mgmt_fault">
<t>
The facilities for verifying correct operation and server management (just
discussed) appear sufficient for PPSPP fault monitoring. This can be
supplemented with host resource <xref target="RFC2790"/> and UDP/IP network
monitoring <xref target="RFC4113"/>, as PPSPP server failures can generally be
attributed directly to conditions on the host or network.
</t>
<t>
Since PPSPP has been designed to work in a hostile environment, many benign
faults will be handled by the mechanisms used for managing attacks. For example, when
a malfunctioning peer starts sending the wrong chunks, this is detected by
the content integrity protection mechanism and another source is sought.
</t>
</section>
<!-- %%% -->
<section title="Configuration Management" anchor="sec_mgmt_mgmt_config">
<t>
Large-scale deployments may benefit from a standard way of replicating
a new piece of content on a set of initial PPSPP servers. This functionality
may need to include controlled releasing, such that content becomes available
only at a specific point in time (e.g. the release of a movie trailer). This
functionality could be provided via NETCONF <xref target="RFC6241"/>, to enable
atomic configuration updates over a set of servers. Uploading the new content
could be one configuration change, making the content available for download
by the public another.
</t>
</section>
<!-- %%% -->
<section title="Accounting Management" anchor="sec_mgmt_mgmt_account">
<t>
Content providers may offer PPSPP hosting for different customers and will want
to bill these customers, for example, based on bandwidth usage. This situation
is a common accounting scenario, similar to billing per virtual server
for Web servers. PPSPP can therefore benefit from general standardization
efforts in this area <xref target="RFC2975"/> when they come to fruition.
</t>
</section>
<!-- %%% -->
<section title="Performance Management" anchor="sec_mgmt_mgmt_perf">
<t>
Depending on the deployment scenarios, the application performance measurement
facilities of <xref target="RFC3729"/> and associated <xref target="RFC4150"/>
can be used with PPSPP.
</t>
<t>
In addition, when the PPSPP tracker protocol is used, it provides
a built-in, application-level, performance measurement infrastructure for
different metrics. See <xref target="RFC6972"/>
(requirement PPSP-TP-REQ-3).
</t>
</section>
<!-- %%% -->
<section title="Security Management" anchor="sec_mgmt_mgmt_sec">
<t>
Malicious peers should ideally be locked out long-term. This is primarily
for performance reasons, as the protocol is robust against attacks (see next
section). <xref target="sec_sec_exclude"/> describes a procedure for long-term
exclusion.
</t>
</section>
</section>
</section>
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<section title="Security Considerations" anchor="sec_sec">
<t>
As any other network protocol, the PPSPP faces a common set of security
challenges. An implementation must consider the possibility of buffer
overruns, DoS attacks and manipulation (i.e. reflection attacks). Any
guarantee of privacy seems unlikely, as the user is exposing its IP address
to the peers. A probable exception is the case of the user being hidden
behind a public NAT or proxy. This section discusses the protocol's security
considerations in detail.
</t>
<!-- %% -->
<section title="Security of the Handshake Procedure" anchor="sec_sec_handshake">
<t>
Borrowing from the analysis in <xref target="RFC5971"/>, the PPSP peer protocol
may be attacked with 3 types of denial-of-service attacks:
</t>
<t>
<list style="numbers">
<t>DOS amplification attack: attackers try to use a PPSPP peer to
generate more traffic to a victim.</t>
<t>DOS flood attack: attackers try to deny service to other peers by
allocating lots of state at a PPSPP peer.</t>
<t>Disrupt service to an individual peer: attackers send bogus e.g. REQUEST
and HAVE messages appearing to come from victim peer A to the peers B1..Bn
serving that peer. This causes A to receive chunks it did not request or to not
receive the chunks it requested.
</t>
</list>
</t>
<t>
The basic scheme to protect against these attacks is the use of a secure
handshake procedure. In the UDP encapsulation the handshake procedure
is secured by the use of randomly chosen channel IDs as follows.
The channel IDs must be generated following the requirements in
<xref target="RFC4960"/> (Sec. 5.1.3).
</t>
<t>
When UDP is used, all datagrams carrying PPSPP messages are prefixed
with a 4-byte channel ID. These channel IDs are random numbers,
established during the handshake phase as follows. Peer A initiates an
exchange with peer B by sending a datagram containing a HANDSHAKE
message prefixed with the channel ID consisting of all 0s. Peer A's
HANDSHAKE contains a randomly chosen channel ID, chanA:
</t>
<t>
A->B: chan0 + HANDSHAKE(chanA) + ...
</t>
<t>
When peer B receives this datagram, it creates some state for peer A,
that at least contains the channel ID chanA. Next, peer B sends a
response to A, consisting of a datagram containing a HANDSHAKE message
prefixed with the chanA channel ID. Peer B's HANDSHAKE contains a
randomly chosen channel ID, chanB.
</t>
<t>
B->A: chanA + HANDSHAKE(chanB) + ...
</t>
<t>
Peer A now knows that peer B really responds, as it echoed chanA. So the
next datagram that A sends may already contain heavy payload, i.e.,
a chunk. This next datagram to B will be prefixed with the chanB channel
ID. When B receives this datagram, both peers have the proof
they are really talking to each other, the three-way handshake is
complete. In other words, the randomly chosen channel IDs act as tags
(cf. <xref target="RFC4960"/> (Sec. 5.1)).
</t>
<t>
A->B: chanB + HAVE + DATA + ...
</t>
<section title="Protection Against Attack 1" anchor="sec_sec_handshake_at1">
<t>
In short, PPSPP does a so-called return routability check before heavy payload
is sent. This means that attack 1 is fended off: PPSPP does not send back much
more data than it received, unless it knows it is talking to a live peer.
Attackers sending a spoofed HANDSHAKE to B pretending to be A now need to
intercept the message from B to A to get B to send heavy payload, and ensure
that that heavy payload goes to the victim, something assumed too hard to be a
practical attack.
</t>
<t>
Note the rule is that no heavy payload may be sent until the third
datagram. This has implications for PPSPP implementations that use chunk
addressing schemes that are verbose. If a PPSPP implementation uses
large bitmaps to convey chunk availability these may not be sent by peer
B in the second datagram.
</t>
</section>
<section title="Protection Against Attack 2" anchor="sec_sec_handshake_at2">
<t>
On receiving the first datagram peer B will record some
state about peer A. At present this state consists of the chanA channel
ID, and the results of processing the other messages in the first
datagram. In particular, if A included some HAVE messages, B may add a
chunk availability map to A's state. In addition, B may request
some chunks from A in the second datagram, and B will maintain state
about these outgoing requests.
</t>
<t>
So presently, PPSPP is somewhat vulnerable to attack 2. An attacker
could send many datagrams with HANDSHAKEs and HAVEs and thus allocate
state at the PPSPP peer. Therefore peer A MUST respond immediately to the
second datagram, if it is still interested in peer B.
</t>
<t>
The reason for using this slightly vulnerable three-way handshake instead of
the safer handshake procedure of SCTP <xref target="RFC4960"/> (Sec. 5.1) is
quicker response time for the user. In the SCTP procedure, peer A and B
cannot request chunks until datagrams 3 and 4 respectively, as opposed to 2
and 1 in the proposed procedure. This means that the user has to wait shorter
in PPSPP between starting the video stream and seeing the first images.
</t>
</section>
<section title="Protection Against Attack 3" anchor="sec_sec_handshake_at3">
<t>
In general, channel IDs serve to authenticate a peer. Hence, to attack,
a malicious peer T would need to be able to eavesdrop on conversations between
victim A and a benign peer B to obtain the channel ID B assigned to A, chanB.
Furthermore, attacker T would need to be able to spoof e.g. REQUEST and HAVE
messages from A to cause B to send heavy DATA messages to A, or prevent B from
sending them, respectively.
</t>
<t>
The capability to eavesdrop is not common, so the protection afforded by
channel IDs will be sufficient in most cases. If not, point-to-point encryption
of traffic should be used, see below.
</t>
</section>
</section>
<!-- %% -->
<section title="Secure Peer Address Exchange" anchor="sec_sec_pex">
<t>
As described in <xref target="sec_msgs_PEX"/>, a peer A can send
Peer-Exchange messages PEX_RES to a peer B, which contain the IP address and port
of other peers that are supposedly also in the current swarm. The strength
of this mechanism is that it allows decentralized tracking: after an initial
bootstrap no central tracker is needed anymore. The vulnerability of this
mechanism (and DHTs) is that malicious peers can use it for an Amplification
attack.
</t>
<t>
In particular, a malicious peer T could send PEX_RES messages to well-behaved
peer A with addresses of peers B1,B2,...,BN and on receipt,
peer A could send a HANDSHAKE to all these peers. So in the worst
case, a single datagram results in N datagrams. The actual damage
depends on A's behavior. E.g. when A already has sufficient connections
it may not connect to the offered ones at all, but if it is a fresh peer
it may connect to all directly.
</t>
<t>
In addition, PEX can be used in Eclipse attacks <xref target="ECLIPSE" />
where malicious peers try to isolate a particular peer such that it only
interacts with malicious peers. Let us distinguish two specific attacks:
</t>
<t>
<list>
<t>E1. Malicious peers try to eclipse the single injector in live streaming.</t>
<t>E2. Malicious peers try to eclipse a specific consumer peer.</t>
</list>
</t>
<t>
Attack E1 has the most impact on the system as it would disrupt all peers.
</t>
<section title="Protection against the Amplification Attack" anchor="sec_sec_pex_ampl">
<t>
If peer addresses are relatively stable, strong protection against the attack
can be provided by using public key cryptography and certification. In
particular, a PEX_REScert message will carry swarm-membership certificates rather
than IP address and port. A membership certificate for peer B states that peer B
at address (ipB,portB) is part of swarm S at time T and is cryptographically
signed. The receiver A can check the cert for a valid signature, the right
swarm and liveliness and only then consider contacting B. These swarm-membership
certificates correspond to signed node descriptors in secure decentralized peer
sampling services <xref target="SPS"/>.
</t>
<t>
Several designs are possible for the security environment for these
membership certificates. That is, there are different designs possible for who
signs the membership certificates and how public keys are distributed.
As an example, we describe a design where the PPSP tracker acts as certification
authority.
</t>
</section>
<section title="Example: Tracker as Certification Authority" anchor="sec_sec_pex_trackerca">
<t>
A peer A wanting to join swarm S sends a certificate request message to a
tracker X for that swarm. Upon receipt, the tracker creates a membership
certificate from the request with swarm ID S, a timestamp T and the external
IP and port it received the message from, signed with the tracker's private
key. This certificate is returned to A.
</t>
<t>
Peer A then includes this certificate when it sends a PEX_REScert to peer B.
Receiver B verifies it against the tracker public key. This tracker public key
should be part of the swarm's metadata, which B received from a trusted
source. Subsequently, peer B can send the member certificate of A to other
peers in PEX_REScert messages.
</t>
<t>
Peer A can send the certification request when it first contacts the tracker,
or at a later time. Furthermore, the responses the tracker sends could contain
membership certificates instead of plain addresses, such that they can be
gossiped securely as well.
</t>
<t>
We assume the tracker is protected against attacks and does a return routability
check. The latter ensures that malicious peers cannot obtain a certificate
for a random host, just for hosts where they can eavesdrop on incoming traffic.
</t>
<t>
The load generated on the tracker depends on churn and the lifetime of a
certificate. Certificates can be fairly long lived, given that the main goal
of the membership certificates is to prevent that malicious peer T can cause
good peer A to contact *random* hosts. The freshness of the timestamp just adds
extra protection in addition to achieving that goal. It protects against
malicious hosts causing a good peer A to contact hosts that previously
participated in the swarm.
</t>
<t>
The membership certificate mechanism itself can be used for a kind of
amplification attack against good peers. Malicious peer T can cause peer A to
spend some CPU to verify the signatures on the membership certificates that T
sends. To counter this, A SHOULD check a few of the certificates sent and discard
the rest if they are defective.
</t>
<t>
The same membership certificates described above can be registered in a
Distributed Hash Table that has been secured against the well-known
DHT specific attacks <xref target="SECDHTS"/>.
</t>
<t>
Note that this scheme does not work for peers behind a symmetric
Network Address Translator, but neither does normal tracker registration.
</t>
</section>
<section title="Protection Against Eclipse Attacks" anchor="sec_sec_pex_eclipse">
<t>
Before we can discuss Eclipse attacks we first need to establish the
security properties of the central tracker. A tracker is vulnerable to
Amplification attacks too. A malicious peer T could register a victim B
with the tracker, and many peers joining the swarm will contact B.
Trackers can also be used in Eclipse attacks. If many malicious peers
register themselves at the tracker, the percentage of bad peers in the
returned address list may become high. Leaving the protection of the
tracker to the PPSP tracker protocol specification, we assume for the following
discussion that it returns a true random sample of the actual swarm
membership (achieved via Sybil attack protection). This means that if
50% of the peers is bad, you'll still get 50% good addresses from the
tracker.
</t>
<t>
Attack E1 on PEX can be fended off by letting live injectors disable
PEX. Or at least, let live injectors ensure that part of their connections
are to peers whose addresses came from the trusted tracker.
</t>
<t>
The same measures defend against attack E2 on PEX. They can also be
employed dynamically. When the current set of peers B that peer A is
connected to doesn't provide good quality of service, A can contact the
tracker to find new candidates.
</t>
</section>
</section>
<!-- %% -->
<section title="Support for Closed Swarms (PPSP.SEC.REQ-1)" anchor="sec_sec_cs">
<t>
The Closed Swarms <xref target="CLOSED"/> and Enhanced Closed Swarms
<xref target="ECS"/> mechanisms provide swarm-level access control. The basic
idea is that a peer cannot download from another peer unless it shows a
Proof-of-Access. Enhanced Closed Swarms improve on the original Closed Swarms
by adding on-the-wire encryption against man-in-the-middle attacks and more
flexible access control rules.
</t>
<t>
The exact mapping of ECS to PPSPP is defined in
<xref target="I-D.gabrijelcic-ppsp-ecs"/>.
</t>
</section>
<!-- %% -->
<section title="Confidentiality of Streamed Content (PPSP.SEC.REQ-2+3)" anchor="sec_sec_conf">
<t>
No extra mechanism is needed to support confidentiality in PPSPP. A
content publisher wishing confidentiality should just distribute content
in cyphertext / DRM-ed format. In that case it is assumed a higher layer
handles key management out-of-band. Alternatively, pure point-to-point
encryption of content and traffic can be provided by the proposed Closed Swarms
access control mechanism, or by DTLS <xref target="RFC6347" /> or IPsec
<xref target="RFC4301"/>.
</t>
</section>
<!-- %% -->
<section title="Strength of the Hash Function for Merkle Hash Trees" anchor="sec_sec_hash">
<t>
Implementations MUST support SHA1 as the hash function for content integrity
protection via Merkle Hash trees. SHA1 is preferred over stronger hash functions
for two reasons. First, it reduces on-the-wire overhead. Second, few
implementations need the extra strength of other functions because the function
is used in a hash tree. In particular, if attackers manage to find a collision
for a hash it can replace just one chunk, so the impact is limited. If fixed
sized chunks are used, the collision has to be of the same size as the
original chunk. For hashes higher up in the hash tree, a collision must be a
concatenation of two hashes. In sum, finding collisions that fit with the hash
tree are generally harder to find than regular SHA1 collisions, which are, at
the time of writing, still hard to find.
</t>
</section>
<!-- %% -->
<section title="Limit Potential Damage and Resource Exhaustion by Bad or Broken Peers (PPSP.SEC.REQ-4+6)" anchor="sec_sec_limit">
<t>
In this section an analysis is given of the potential damage
a malicious peer can do with each message in the protocol, and how it
is prevented by the protocol (implementation).
</t>
<section title="HANDSHAKE" anchor="sec_sec_limit_HANDSHAKE">
<t>
<list style="symbols">
<t>Secured against DoS amplification attacks as described in
<xref target="sec_sec_handshake"/>.
</t>
<t>Threat HS.1: An Eclipse attack where peers T1..Tn fill all connection
slots of A by initiating the connection to A.
<vspace blankLines="1"/>
Solution: Peer A must not let other peers fill all its available
connection slots, i.e., A must initiate connections itself too, to prevent
isolation.
</t>
</list>
</t>
</section>
<section title="HAVE" anchor="sec_sec_limit_HAVE">
<t>
<list style="symbols">
<t>Threat HAVE.1: Malicious peer T can claim to have content which it
hasn't. Subsequently T won't respond to requests.
<vspace blankLines="1"/>
Solution: peer A will consider T to be a slow peer and not ask it
again.
</t>
<t>
Threat HAVE.2: Malicious peer T can claim not to have content. Hence it
won't contribute.
<vspace blankLines="1"/>
Solution: Peer and chunk selection algorithms external to the protocol will
implement fairness and provide sharing incentives.
</t>
</list>
</t>
</section>
<section title="DATA" anchor="sec_sec_limit_DATA">
<t>
<list style="symbols">
<t>Threat DATA.1: peer T sending bogus chunks.
<vspace blankLines="1"/>
Solution: The content integrity protection schemes defend against
this.
</t>
<t>Threat DATA.2: peer T sends peer A unrequested chunks.
<vspace blankLines="1"/>
To protect against this threat we need network-level DoS
prevention.
</t>
</list>
</t>
</section>
<section title="ACK" anchor="sec_sec_limit_ACK">
<t>
<list style="symbols">
<t>Threat ACK.1: peer T acknowledges wrong chunks.
<vspace blankLines="1"/>
Solution: peer A will detect inconsistencies with the data it sent to T.
</t>
<t>Threat ACK.2: peer T modifies timestamp in ACK to peer A used for
time-based congestion control.
<vspace blankLines="1"/>
Solution: In theory, by decreasing the timestamp peer T could fake there is
no congestion when in fact there is, causing A to send more data than it
should. <xref target="RFC6817"/> does not list this as a
security consideration. Possibly this attack can be detected by the large
resulting asymmetry between round-trip time and measured one-way delay.
</t>
</list>
</t>
</section>
<section title="INTEGRITY and SIGNED_INTEGRITY" anchor="sec_sec_limit_INTEGRITY">
<t>
<list style="symbols">
<t>Threat INTEGRITY.1: An amplification attack where peer T sends bogus INTEGRITY
or SIGNED_INTEGRITY messages, causing peer A to checks hashes or signatures,
thus spending CPU unnecessarily.
<vspace blankLines="1"/>
Solution: If the hashes/signatures don't check out A will stop asking T
because of the atomic datagram principle and the content integrity
protection. Subsequent unsolicited traffic from T will be ignored.
</t>
<t>Threat INTEGRITY.2: An attack where peer T sends old SIGNED_INTEGRITY
messages in the Unified Merkle Tree scheme, trying to make peer A
tune in at a past point in the live stream.
<vspace blankLines="1"/>
Solution: The timestamp in the SIGNED_INTEGRITY message protects against
such replays. Subsequent traffic from T will be ignored.
</t>
</list>
</t>
</section>
<section title="REQUEST" anchor="sec_sec_limit_REQUEST">
<t>
<list style="symbols">
<t>Threat REQUEST.1: peer T could request lots from A, leaving A without
resources for others.
<vspace blankLines="1"/>
Solution: A limit is imposed on the upload capacity a single peer
can consume, for example, by using an upload bandwidth scheduler that takes
into account the need of multiple peers. A natural upper limit of this
upload quotum is the bitrate of the content, taking into account that this
may be variable.
</t>
</list>
</t>
</section>
<section title="CANCEL" anchor="sec_sec_limit_CANCEL">
<t>
<list style="symbols">
<t>Threat CANCEL.1: peer T sends CANCEL messages for content it never requested
to peer A.
<vspace blankLines="1"/>
Solution: peer A will detect the inconsistency of the messages and ignore
them. Note that CANCEL messages may be received unexpectedly when a
transport is used where REQUEST messages may be lost or reordered with
respect to the subsequent CANCELs.
</t>
</list>
</t>
</section>
<section title="CHOKE" anchor="sec_sec_limit_CHOKE">
<t>
<list style="symbols">
<t>Threat CHOKE.1: peer T sends REQUEST messages after peer A sent B a CHOKE
message.
<vspace blankLines="1"/>
Solution: peer A will just discard the unwanted REQUESTs and resend the
CHOKE, assuming it got lost.
</t>
</list>
</t>
</section>
<section title="UNCHOKE" anchor="sec_sec_limit_UNCHOKE">
<t>
<list style="symbols">
<t>Threat UNCHOKE.1: peer T sends an UNCHOKE message to peer A without having
sent a CHOKE message before.
<vspace blankLines="1"/>
Solution: peer A can easily detect this violation of protocol state,
and ignore it. Note this can also happen due to loss of a CHOKE message
sent by a benign peer.
</t>
<t>Threat UNCHOKE.2: peer T sends an UNCHOKE message to peer A, but subsequently
does not respond to its REQUESTs.
<vspace blankLines="1"/>
Solution: peer A will consider T to be a slow peer and not ask it
again.
</t>
</list>
</t>
</section>
<section title="PEX_RES" anchor="sec_sec_limit_PEX_RES">
<t>
<list style="symbols">
<t> Secured against amplification and Eclipse attacks as described in
<xref target="sec_sec_pex"/>.
</t>
</list>
</t>
</section>
<section title="Unsolicited Messages in General" anchor="sec_sec_limit_genmsg">
<t>
<list style="symbols">
<t>
Threat: peer T could send a spoofed PEX_REQ or REQUEST from peer B to
peer A, causing A to send a PEX_RES/DATA to B.
<vspace blankLines="1"/>
Solution: the message from peer T won't be accepted unless T does a
handshake first, in which case the reply goes to T, not victim B.
</t>
</list>
</t>
</section>
</section>
<section title="Exclude Bad or Broken Peers (PPSP.SEC.REQ-5)" anchor="sec_sec_exclude">
<t>
A receiving peer can detect malicious or faulty senders as just described,
which it can then subsequently ignore. However,
excluding such a bad peer from the system completely is complex. Random
monitoring by trusted peers that would blacklist bad peers as described
in <xref target="DETMAL"/> is one option. This mechanism does require extra
capacity to run such trusted peers, which must be indistinguishable
from regular peers, and requires a solution for the timely distribution of this
blacklist to peers in a scalable manner.
</t>
</section>
</section>
</middle>
<!-- *****BACK MATTER ***** -->
<back>
<!-- References split into informative and normative -->
<!-- There are 2 ways to insert reference entries from the citation libraries:
1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
(for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")
Both are cited textually in the same manner: by using xref elements.
If you use the PI option, xml2rfc will, by default, try to find included files in the same
directory as the including file. You can also define the XML_LIBRARY environment variable
with a value containing a set of directories to search. These can be either in the local
filing system or remote ones accessed by http (http://domain/dir/... ).-->
<references title="Normative References">
&RFC2119;
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1918.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3110.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3986.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4034.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4291.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5280.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5905.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6605.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6817.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml2/_reference.CCITT.X208.1988.xml"?>
<reference anchor="FIPS180-4">
<front>
<title>Federal Information Processing Standards: Secure Hash Standard (SHS)</title>
<author>
<organization>Information Technology Laboratory,
National Institute of Standards and Technology</organization>
<address>
<postal>
<street></street>
<city>Gaithersburg, MD 20899-8900</city>
<country>USA</country>
</postal>
</address>
</author>
<date year="2012" month="Mar"/>
</front>
<seriesInfo name="Publication" value="180-4"/>
</reference>
<reference anchor="IANADNSSECALGNUM" target="http://www.iana.org/assignments/dns-sec-alg-numbers">
<front>
<title>Domain Name System Security (DNSSEC) Algorithm Numbers</title>
<author>
<organization>IANA</organization>
</author>
</front>
</reference>
</references>
<references title="Informative References">
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2564.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2790.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2975.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3365.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3729.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4113.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4150.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4301.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4821.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6241.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4960.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5226.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5389.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5424.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5706.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5971.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6709.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6347.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6972.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.draft-ietf-ppsp-base-tracker-protocol-03.xml"?>
<?rfc include="http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.draft-ietf-alto-protocol-27.xml"?>
<reference anchor="MERKLE">
<front>
<title>Secrecy, Authentication, and Public Key Systems</title>
<author initials="R." surname="Merkle">
<organization>Dept. of Electrical Engineering, Stanford University, CA, USA</organization>
</author>
<date year="1979" />
</front>
<seriesInfo name="Ph.D. thesis" value="Dept. of Electrical Engineering, Stanford University, CA, USA, pp 40-45"/>
</reference>
<reference anchor="ABMRKL" target="http://bittorrent.org/beps/bep_0030.html">
<front>
<title>Merkle hash torrent extension</title>
<author initials="A." surname="Bakker">
<organization></organization>
</author>
<date year="2009" month="Mar"/>
</front>
<seriesInfo name="BitTorrent Enhancement Proposal" value="30"/>
</reference>
<reference anchor="JIM11">
<front>
<title>Sub-Second Lookups on a Large-Scale Kademlia-Based Overlay</title>
<author initials="R." surname="Jimenez">
<organization></organization>
</author>
<author initials="F." surname="Osmani">
<organization></organization>
</author>
<author initials="B." surname="Knutsson">
<organization></organization>
</author>
<date year="2011" month="Aug"/>
</front>
<seriesInfo name="IEEE International Conference on Peer-to-Peer Computing" value="(P2P'11), Kyoto, Japan"/>
</reference>
<reference anchor="SWIFTIMPL" target="https://github.com/libswift/libswift">
<front>
<title>Swift reference implementation</title>
<author initials="V." surname="Grishchenko">
<organization></organization>
</author>
<author initials="J." surname="Paananen">
<organization></organization>
</author>
<author initials="A." surname="Pronchenkov">
<organization></organization>
</author>
<author initials="A." surname="Bakker">
<organization></organization>
</author>
<author initials="R." surname="Petrocco">
<organization></organization>
</author>
<date year="2014" />
</front>
</reference>
<reference anchor="HAC01">
<front>
<title>Handbook of Applied Cryptography</title>
<author initials="A.J." surname="Menezes">
<organization></organization>
</author>
<author initials="P.C." surname="van Oorschot">
<organization></organization>
</author>
<author initials="S.A." surname="Vanstone">
<organization></organization>
</author>
<date year="1996" month="Oct"/>
</front>
<seriesInfo name="CRC Press," value="(Fifth Printing, August 2001)"/>
</reference>
<reference anchor="BITTORRENT" target="http://bittorrent.org/beps/bep_0003.html">
<front>
<title>The BitTorrent Protocol Specification</title>
<author initials="B." surname="Cohen">
<organization></organization>
</author>
<date year="2008" month="Feb"/>
</front>
<seriesInfo name="BitTorrent Enhancement Proposal" value="3"/>
</reference>
<reference anchor="BINMAP">
<front>
<title>Binmaps: hybridizing bitmaps and binary trees</title>
<author initials="V." surname="Grishchenko">
<organization></organization>
</author>
<author initials="J." surname="Pouwelse">
<organization></organization>
</author>
<date year="2009" month="Apr"/>
</front>
<seriesInfo name="Technical Report" value="PDS-2011-005, Parallel and
Distributed Systems Group, Fac. of Electrical Engineering,
Mathematics, and Computer Science, Delft University of Technology,
The Netherlands"/>
</reference>
<reference anchor="TIT4TAT">
<front>
<title>Incentives Build Robustness in BitTorrent</title>
<author initials="B." surname="Cohen">
<organization></organization>
</author>
<date year="2003" month="Jun" />
</front>
<seriesInfo name="1st Workshop on Economics of Peer-to-Peer Systems," value="Berkeley, CA, USA"/>
</reference>
<reference anchor="CLOSED" target="http://doi.acm.org/10.1145/1877891.1877898">
<front>
<title>Access Control to BitTorrent Swarms Using Closed Swarms</title>
<author initials="N." surname="Borch">
<organization></organization>
</author>
<author initials="K." surname="Mitchell">
<organization></organization>
</author>
<author initials="I." surname="Arntzen">
<organization></organization>
</author>
<author initials="D." surname="Gabrijelcic">
<organization></organization>
</author>
<date year="2010" month="Oct"/>
</front>
<seriesInfo name="ACM workshop on Advanced Video Streaming Techniques
for Peer-to-Peer Networks and Social Networking" value="(AVSTP2P '10), Florence, Italy"/>
</reference>
<reference anchor="ECS">
<front>
<title>Access Control in BitTorrent P2P Networks Using the Enhanced
Closed Swarms Protocol</title>
<author initials="V." surname="Jovanovikj">
<organization></organization>
</author>
<author initials="D." surname="Gabrijelcic">
<organization></organization>
</author>
<author initials="T." surname="Klobucar">
<organization></organization>
</author>
<date year="2011" month="Aug"/>
</front>
<seriesInfo name="International Conference on Emerging Security
Information, Systems and Technologies" value="(SECURWARE 2011), pp. 97-102, Nice, France"/>
</reference>
<reference anchor="ECLIPSE">
<front>
<title>Security Considerations for Peer-to-Peer Distributed Hash Tables</title>
<author initials="E." surname="Sit">
<organization></organization>
</author>
<author initials="R." surname="Morris">
<organization></organization>
</author>
<date year="2002" />
</front>
<seriesInfo name="IPTPS '01: Revised Papers
from the First International Workshop on Peer-to-Peer Systems"
value="pp. 261-269, Springer-Verlag"/>
</reference>
<reference anchor="SECDHTS">
<front>
<title>A Survey of DHT Security Techniques</title>
<author initials="G." surname="Urdaneta">
<organization></organization>
</author>
<author initials="G." surname="Pierre">
<organization></organization>
</author>
<author initials="M." surname="van Steen">
<organization></organization>
</author>
<date year="2011" month="Jun"/>
</front>
<seriesInfo name="ACM Computing Surveys" value="vol. 43(2)"/>
</reference>
<reference anchor="SPS">
<front>
<title>Secure Peer Sampling</title>
<author initials="G.P." surname="Jesi">
<organization></organization>
</author>
<author initials="A." surname="Montresor">
<organization></organization>
</author>
<author initials="M." surname="van Steen">
<organization></organization>
</author>
<date year="2010" month="Aug"/>
</front>
<seriesInfo name="Computer Networks" value="vol. 54(12), pp. 2086-2098, Elsevier"/>
</reference>
<reference anchor="SIGMCAST">
<front>
<title>Digital Signatures for Flows and Multicasts</title>
<author initials="C.K." surname="Wong">
<organization></organization>
</author>
<author initials="S.S." surname="Lam">
<organization></organization>
</author>
<date year="1999"/>
</front>
<seriesInfo name="IEEE/ACM Transactions on Networking" value="7(4), pp. 502-513"/>
</reference>
<reference anchor="POLLIVE">
<front>
<title>Pollution in P2P Live Video Streaming</title>
<author initials="P." surname="Dhungel">
<organization></organization>
</author>
<author initials="Xiaojun" surname="Hei">
<organization></organization>
</author>
<author initials="K." surname="Ross">
<organization></organization>
</author>
<author initials="N." surname="Saxena">
<organization></organization>
</author>
<date year="2009" month="Jul"/>
</front>
<seriesInfo name="International Journal of Computer Networks & Communications (IJCNC)" value="Vol.1, No.2"/>
</reference>
<reference anchor="DETMAL">
<front>
<title>Detecting Malicious Peers in Overlay Multicast Streaming</title>
<author initials="S." surname="Shetty">
<organization></organization>
</author>
<author initials="P." surname="Galdames">
<organization></organization>
</author>
<author initials="W." surname="Tavanapong">
<organization></organization>
</author>
<author initials="Ying" surname="Cai">
<organization></organization>
</author>
<date year="2006" month="Nov"/>
</front>
<seriesInfo name="IEEE Conference on Local Computer Networks" value="(LCN'06). Tampa, FL, USA"/>
</reference>
<reference anchor="GIVE2GET">
<front>
<title>Give-to-Get: Free-riding Resilient Video-on-demand in P2P Systems</title>
<author initials="J.J.D." surname="Mol">
<organization></organization>
</author>
<author initials="J." surname="Pouwelse">
<organization></organization>
</author>
<author initials="M." surname="Meulpolder">
<organization></organization>
</author>
<author initials="D.H.J." surname="Epema">
<organization></organization>
</author>
<author initials="H.J." surname="Sips">
<organization></organization>
</author>
<date year="2008" month="Jan"/>
</front>
<seriesInfo name="Proceedings Multimedia Computing and Networking conference
(Proceedings of SPIE Vol. 6818)" value="San Jose, California, USA"/>
</reference>
<reference anchor="EPLIVEPERF">
<front>
<title>Epidemic Live Streaming: Optimal Performance Trade-offs</title>
<author initials="T." surname="Bonald">
<organization></organization>
</author>
<author initials="L." surname="Massoulié">
<organization></organization>
</author>
<author initials="F." surname="Mathieu">
<organization></organization>
</author>
<author initials="D." surname="Perino">
<organization></organization>
</author>
<author initials="A." surname="Twigg">
<organization></organization>
</author>
<date year="2008" month="Jun"/>
</front>
<seriesInfo name="Proceedings of the 2008 ACM SIGMETRICS International
Conference on Measurement and Modeling of Computer Systems"
value="Annapolis, MD, USA"/>
</reference>
<reference anchor="BITOS">
<front>
<title>BiToS: Enhancing BitTorrent for Supporting Streaming Applications</title>
<author initials="A." surname="Vlavianos">
<organization></organization>
</author>
<author initials="M." surname="Iliofotou">
<organization></organization>
</author>
<author initials="F." surname="Mathieu">
<organization></organization>
</author>
<author initials="M." surname="Faloutsos">
<organization></organization>
</author>
<date year="2006" month="Apr"/>
</front>
<seriesInfo name="IEEE INFOCOM Global Internet Symposium"
value="Barcelona, Spain"/>
</reference>
<reference anchor='I-D.gabrijelcic-ppsp-ecs'>
<front>
<title>Enhanced Closed Swarm protocol</title>
<author initials='D.' surname='Gabrijelcic' fullname='Dusan Gabrijelcic'>
<organization>Jozef Stefan Institute, Slovenia</organization>
</author>
<date month='November' day='29' year='2012' />
</front>
<seriesInfo name='Internet-Draft' value='draft-ppsp-gabrijelcic-ecs' />
<format type='TXT'
target='http://datatracker.ietf.org/doc/draft-ppsp-gabrijelcic-ecs/' />
</reference>
<reference anchor='LBT'>
<front>
<title>LEDBAT: the new BitTorrent congestion control protocol</title>
<author initials="D." surname="Rossi">
<organization></organization>
</author>
<author initials="C." surname="Testa">
<organization></organization>
</author>
<author initials="S." surname="Valenti">
<organization></organization>
</author>
<author initials="L." surname="Muscariello">
<organization></organization>
</author>
<date year="2010" month="Aug"/>
</front>
<seriesInfo name="Computer Communications and Networks"
value="(ICCCN), Zurich, Switzerland"/>
</reference>
<reference anchor='LCOMPL'>
<front>
<title>On the impact of uTP on BitTorrent completion time </title>
<author initials="C." surname="Testa">
<organization></organization>
</author>
<author initials="D." surname="Rossi">
<organization></organization>
</author>
<date year="2011" month="Aug"/>
</front>
<seriesInfo name="IEEE International Conference on Peer-to-Peer Computing"
value="(P2P'11), Kyoto, Japan"/>
</reference>
<reference anchor='PPSPPERF'>
<front>
<title>Performance analysis of the Libswift P2P streaming protocol</title>
<author initials="R." surname="Petrocco">
<organization></organization>
</author>
<author initials="J." surname="Pouwelse">
<organization></organization>
</author>
<author initials="D." surname="Epema">
<organization></organization>
</author>
<date year="2012" month="Sept"/>
</front>
<seriesInfo name="IEEE International Conference on Peer-to-Peer Computing"
value="(P2P'12), Tarragona, Spain"/>
</reference>
</references>
<section title="Revision History" anchor="sec_revhist">
<t>
<list style="hanging" hangIndent="4">
<t hangText="-00 2011-12-19 Initial version.">
</t>
<t hangText="-01 2012-01-30 Minor text revision:">
<list style="symbols">
<t>
Changed heading to "A. Bakker"
</t><t>
Changed title to *Peer* Protocol, and abbreviation PPSPP.
</t><t>
Replaced swift with PPSPP.
</t><t>
Removed Sec. 6.4. "HTTP (as PPSP)".
</t><t>
Renamed Sec. 8.4. to "Chunk Picking Algorithms".
</t><t>
Resolved Ticket #3: Removed sentence about random set of peers.
</t><t>
Resolved Ticket #6: Added clarification to "Chunk Picking
Algorithms" section.
</t><t>
Resolved Ticket #11: Added Sec. 3.12 on Storage Independence
</t><t>
Resolved Ticket #14: Added clarification to "Automatic Size
Detection" section.
</t><t>
Resolved Ticket #15: Operation section now states it shows
example behaviour for a specific set of policies and schemes.
</t><t>
Resolved Ticket #30: Explained why multiple REQUESTs in one datagram.
</t><t>
Resolved Ticket #31: Renamed PEX_ADD message to PEX_RES.
</t><t>
Resolved Ticket #32: Renamed Sec 3.8. to "Keep Alive Signaling",
and updated explanation.
</t><t>
Resolved Ticket #33: Explained NAT hole punching via only PPSPP
messages.
</t><t>
Resolved Ticket #34: Added section about limited overhead of the
Merkle hash tree scheme.
</t>
</list>
</t>
<t hangText="-02 2012-04-17 Major revision">
<list style="symbols">
<t>
Allow different chunk addressing and content integrity protection schemes (ticket #13):
</t><t>
Added chunk ID, chunk specification, chunk addressing scheme, etc. to terminology.
</t><t>
Created new Sections 4 and 5 discussing chunk addressing and content integrity protection schemes, respectively and moved relevant sections on bin numbering and Merkle hash trees there.
</t><t>
Renamed Section 4 to "Merkle Hash Trees and The Automatic Detection of Content Size".
</t><t>
Reformulated automatic size detection in terms of nodes, not bins.
</t><t>
Extended HANDSHAKE message to carry protocol options and created Section 8 on Protocol options. VERSION and MSGTYPE_RCVD messages replaced with protocol options.
</t><t>
Renamed HASH message to INTEGRITY.
</t><t>
Renamed HINT to REQUEST.
</t><t>
Added description of chunk addressing via (start,end) ranges.
</t><t>
Resolved Ticket #26: Extended "Security Considerations" with section on the
handshake procedure.
</t><t>
Resolved Ticket #17: Defined recently as "in last 60 seconds" in PEX.
</t><t>
Resolved Ticket #20: Extended "Security Considerations" with design to make
Peer Address Exchange more secure.
</t>
<t>
Resolved Ticket #38+39 / PPSP.SEC.REQ-2+3: Extended "Security Considerations"
with a section on confidentiality of content.
</t>
<t>
Resolved Ticket #40+42 / PPSP.SEC.REQ-4+6: Extended "Security Considerations"
with a per-message analysis of threats and how PPSPP is protected from them.
</t>
<t>
Progressed Ticket #41 / PPSP.SEC.REQ-5: Extended "Security Considerations"
with a section on possible ways of excluding bad or broken peers from the
system.
</t>
<t>
Moved Rationale to Appendix.
</t>
<t>
Resolved Ticket #43: Updated Live Streaming section to include "Sign All"
content authentication, and reference to <xref target="SIGMCAST"/> following
discussion with Fabio Picconi.
</t>
<t>
Resolved Ticket #12: Added a CANCEL message to cancel REQUESTs for the same
data that were sent to multiple peers at the same time in time-critical
situations.
</t>
</list>
</t>
<t hangText="-03 2012-10-22 Major revision">
<list style="symbols">
<t>
Updated Abstract and Introduction, removing download case.
</t>
<t>
Resolved Ticket #4: Added explicit CHOKE/UNCHOKE messages.
</t>
<t>
Removed directory lists unused in streaming.
</t>
<t>Resolved Ticket #22, #23, #28: Failure behaviour, error codes and dealing
with peer crashes.
</t>
<t>
Resolved Ticket #13: Chunk ranges are the default chunk addressing scheme
that all peers MUST support.
</t>
<t>
Added a section on compatibility between chunk addressing schemes.
</t>
<t>
Expanded the explanation of Unified Merkle Trees as a method for content
integrity protection for live streams.
</t>
<t>
Added a section on forgetting chunks in live streaming.
</t>
<t>
Added "End" option to protocol options and corrected bugs in UDP encapsulation,
following Karl Knutsson's comments.
</t>
<t>
Added SHA-2 support for Merkle Hash functions.
</t>
<t>
Added content integrity protection methods for live streaming to
the relevant protocol option.
</t>
<t>
Added a Live Signature Algorithm protocol option.
</t>
<t>
Resolved Ticket #24+27: The choice for UDP + LEDBAT as transport has now been
reflected in the draft. TCP and RTP encapsulations have been removed.
</t>
<t>
Superfluous parts of Section 10 on extensibility have been removed.
</t>
<t>
Removed appendix with Rationale.
</t>
<t>
Resolved Ticket #21+25: PPSPP currently uses LEDBAT and the DATA and ACK
messages now contain the time fields it requires. Should other congestion
control algorithms be supported in the future, a protocol option will be
added.
</t>
</list>
</t>
<t hangText="-04 2012-11-07 Minor revision">
<list style="symbols">
<t>
Corrected typos.
</t>
<t>
Added empty protocol option list when HANDSHAKE is used for explicitly
closing a channel in the UDP encapsulation.
</t>
<t>
Corrected definition of a range chunk specification to be a single (start,end)
pair. To send multiple disjunct ranges multiple messages should be used.
</t>
<t>
Clarified that in a range chunk specification the end is inclusive. I.e.,
[start,end] not [start,end)
</t>
<t>
Added PEX_REScert message to carry a membership certificate. Renamed PEX_RES
to PEX_RESv4.
</t>
<t>
Added a guideline about private and link-local addresses in PEX_RES messages.
</t>
<t>
Defined the format of the public key that is used as swarm ID in live streaming.
</t>
<t>
Clarified that a HANDSHAKE message must be the first message in a datagram.
</t>
<t>
Clarified sending INTEGRITY messages ahead in a separate datagram if not
all necessary hashes that still need to be sent and the chunk fit into a
single datagram. Defined an order for the INTEGRITY messages.
</t>
<t>
Clarified rare case of sending multiple DATA messages in one datagram.
</t>
<t>
Clarified UDP datagrams carrying PPSPP should adhere to the network's MTU to
avoid IP fragmentation.
</t>
<t>
Defined value for version protocol option.
</t>
<t>
Added small clarifications and corrected typos.
</t>
<t>
Extended versioning scheme to Min/max versioning scheme defined in
<xref target="RFC6709"/>, Section 4.1, following Riccardo Bernardini's
suggestion.
</t>
<t>
Processed comments on unclear phrasing from Riccardo Bernardini.
</t>
<t>
Added a guideline on when to declare a peer dead.
</t>
<t>
Made sure all essential references are listed as Normative references
following RFC3967.
</t>
</list>
</t>
<t hangText="-05 2013-01-23 Minor revision">
<list style="symbols">
<t>
Corrected category to Standards Track.
</t>
<t>
Clarified that swarm identifier is a required protocol option in an initiating
HANDSHAKE in the UDP encapsulation.
</t>
<t>
Added IANA considerations and tablised name spaces for registry definition.
</t>
</list>
</t>
<t hangText="-06 2013-02-11 Minor revision">
<list style="symbols">
<t>
Updated "Overall Operation" to have more context (HTML5 video).
</t>
<t>
Clarified wording on PEX_REQ.
</t>
<t>
Clarified wording on SIGNED_INTEGRITY.
</t>
<t>
Added a reference on how ALTO can be used with PPSPP.
</t>
<t>
Added Manageability Consideration section following RFC5706.
</t>
<t>
Clarified that implementations SHOULD implement the "Unified Merkle Tree"
content integrity protection method for live, and MAY implement "Sign All".
</t>
<t>
Made SHA1 hash function mandatory-to-implement as Merkle Tree Hash function
and explained the security considerations.
</t>
<t>
Made RSA/SHA1 mandatory-to-implement as Live Signature Algorithm for integrity
protection while live streaming.
</t>
<t>
Clarified that implementations MUST implement addressing via 32-bit chunk
ranges.
</t>
<t>
Made LEDBAT an Informational reference to prevent a so-called "down ref".
</t>
<t>
Updated reference to PPSP problem statement and requirements document.
</t>
<t>
Used kibibyte unit in formal sections.
</t>
</list>
</t>
<t hangText="-07 2013-06-19 Revision following AD Review">
<list style="empty">
<t>
Quoting the AD review by Martin Stiemerling:
***High-level issues:
</t>
<t>
1) Merkle Hash Trees
I have found the document very confusing on whether Merkle Hash Trees
(MHTs) and the for the MHT required bin numbering scheme are now
optional or mandatory. Parts of the draft make the impression that
either of them or both or optional (mainly in the beginning of the
document), while Section 5 and later Sections are relying heavily on MHTs.
My naive reading of the current draft is that you could rely on
start-end ranges for chunk addressing and MHTs for content protection.
However, I do know that this combination is not working.
If MHTs are really optional, including the bin numbering, the document
should really state this and make clear what the operations of the
protocol are with the mandatory to implement (MTI) mechanisms. The MHT,
bins, and all the protocol handling should go in an appendix.
There is a call to make for the WG:
I do know that MHTs were considered by some as burden and they have
called for a leaner way, i.e., the start-end ranges.
The call for the leaner way has been implemented in the document but not
fully.
<list style="symbols">
<t>
The text now states that MHTs SHOULD be used unless in benign environments
and are mandatory-to-implement. It also states that only start-end chunk range
is mandatory-to-implement, and bins are optional.
</t>
</list>
</t>
<t>
2) LEDBAT as congestion control vs. PPSPP
The PPSP peer protocol is intended for the Standards Track and relies in
a normative manner on LEDBAT (RFC 6817). LEDBAT as such is an
**experimental** delay-based congestion control algorithm.
A Standards Track protocol cannot normatively rely on an Experimental
congestion control mechanism (or RFC in general).
There are ways out of this situation:
i) Do not use ledbat: this would call for another congestion control
mechanism to be described in the PPSPP draft.
ii) Work on an 'upgrade' of the LEDBAT specification to Standards Track:
Possible, but a very long way.
iii) Agree on having PPSPP also as Experimental protocol.
I'm currently leaning towards option iii), but this is my pure personal
opinion as an individual in the IETF.
<list style="symbols">
<t>
A new paragraph has been added to <xref target="sec_encap_udp_control"/>
describing the widespread use of LEDBAT in current P2P systems. Hence, aim is a
DOWNREF procedure.
</t>
</list>
</t>
<t>
3) No formal protocol message definition
Section 7 and more specific Section 8 describe the protocol syntax of
the protocol options and the messages, though Section 8 is talking about
UDP encapsulation.
Section 7 is hard to digest if someone should implement the options, see
also later, but Section 8 is almost impossible to understand by somebody
who has not been involved in the PPSP working group. See also further
down for a more detailed review of the sections.
To give an example out of Section 8.4:
This section describes the HANDSHAKE message and gives examples how such
a HANDSHAKE message could look like.
But no formal definition of the message is given leaving a number of
thins unclear, such as what the local channel number and what's the
remote channel number is. This is implicitly defined, but that is not a
good way of writing Standards Track drafts.
<list style="symbols">
<t>
We added the usual bit-based ASCII art representations.
</t>
</list>
</t>
<t>
4) Implicit use of default values
There are a number of places all over the draft where default values are
defined. Many of those default values are used when there are no values
explicitly signaled, e.g., the default chunk size of 1 Kbyte in Section
8.4 or Section Section 7.5. with the default for the Content Integrity
Protection Method.
I have the feeling that the protocol and the surroundings (e.g., what
comes in via the 'tracker') are over-optimized, e.g., always providing
the Content Integrity Protection Method as part of the Protocol options
will not waste more than 2 bytes in a HANDSHAKE message.
Further, I do not see the need to define a default chunk size in the
base protocol specification, as this default can look very different,
depending on who is deploying the protocol and in what context. This
calls for a more dynamic way of handling the system chunk size, either
as part of an external mechanisms (e.g. via the tracker) or in the
HANDSHAKE message.
<list style="symbols">
<t>
Removed implicit defaults from protocol options. Chunk size is part of
the content's metadata and thus configurable. The default 1KiB has been
turned into a recommendation.
</t>
</list>
</t>
<t>
5) Concept of channels
The concept of channels is good but it is introduced too late in the
draft, namely in Section 8.3, and it is introduced with very few words.
Why isn't this introduced as part of Section 2 or Section 3, also in the
relationship to the used transport protocol?
I.e., the intention is to keep only one transport 'connection' between
two distinct peers and to allow to run multiple swarm instances at the
same time over the same transport.
And how do swarms and channels correlate?
<list style="symbols">
<t>
Concept now introduced in Section 3 with a figure.
</t>
</list>
</t>
<t>
***Technicals:
</t>
<t>
- Section 2.1, 2nd paragraph, about the tracker:
I haven't seen a single place where the interaction with a tracker is
discussed or where the tracker less operation is discussed in contrast.
It is further unclear what type of information is really required from a
tracker.
A tracker (or a resource directory) would need to provide more then IP
address & port, e.g., the used transport protocol for the protocol
exchange (given that other transports are allowed), used chunk size,
chunk addressing scheme, etc
<list style="symbols">
<t>
Interaction with tracking facilities in general is discussed in the Operations
and Management section, <xref target="sec_mgmt_op_install"/>. This also
discusses swarm metadata and information required from tracking facility.
Decentralized tracking in PPSPP is discussed in <xref target="sec_msgs_PEX_msgs"/>.
</t>
</list>
</t>
<t>
- Section 2.3, the 1st paragraph, 'close-channel':
This has been the first time where I stumbled over the channel without
knowing the concept.
<list style="symbols">
<t>
Rephrased.
</t>
</list>
</t>
<t>
- Section 3.1: ordering of messages
The 1st sentence implies that ordering of messages in a datagram matters
a lot. This is outlined later in the document, but I would add this as
part of 3., i.e., the messages are processed in the strict order or
something along this line.
<list style="symbols">
<t>
Phrase added.
</t>
</list>
</t>
<t>
- Section 3.1, 1st paragraph, options to include
I would not say anything about 'SHOULD include options' here, as this is
anyhow described in Section 8.
<list style="symbols">
<t>
Phrase removed.
</t>
</list>
</t>
<t>
- Section 3.1, 2nd paragraph:
"Datagrams exchanged MAY also contain some minor payload, e.g. HAVE
messages to indicate the current progress of a peer or a REQUEST (see
Section 3.7)."
to be added, just to make it clear IMHO: ", but MUST NOT include any
DATA message".
<list style="symbols">
<t>
Added.
</t>
</list>
</t>
<t>
- Section 3.2, 2nd paragraph:
"In particular, whenever a receiving peer has successfully checked the
integrity of a chunk or interval of chunks it MUST send a HAVE message
to all peers it wants to interact with in the near future."
This looks like a place where a lot of traffic can be send out of a
peer, i.e., whenever a chunk arrives a HAVE message must be sent.
I don't believe that this should be mandated by the protocol
specification, but there should guidance on when to send this, e.g.,
peers might be also able to wait for a short period of time to gather
more chunks to be reported in HAVE. Or should in this case a single UDP
datagram contain multiple HAVEs?
<list style="symbols">
<t>
Clarified that this is indeed controlled by a policy outside the peer
protocol that can decide to piggyback onto other traffic or wait till
multile chunks are verified.
</t>
</list>
</t>
<t>
- Section 3.4 on ACKs
This section looks pretty weak, as ACKs may be sent but on the other
hand MUST be sent if ledbat is used. I would simply say:
- ACK MUST be sent if an unreliable transport protocol is used
- ACK MAY be sent if a reliable transport protocol is used
- keep clarification about ledbat.
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 3.5:
Give text where INTEGERITY is described at least for the MTI scheme.
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 3.7, 2nd paragraph
- all 'MAY' are actually not right here. Please remove or replace them
with lower letters if appropriate.
- It is not clear what the 'sequentially' means exactly. Is it in the
received order?
<list style="symbols">
<t>
Rephrased MAYs. "Sequentially" replaced with "received order".
</t>
</list>
</t>
<t>
- Section 3.8:
Please replace 'MAY' by can, as those are not normative behaviors but
more the fact that peers can, for instance, request urgent data.
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 3.9
Same comment as for the Section 3.8 just above this comment.
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 3.9 waiting for responses
OLD
" When peer B receives a CHOKE message from A it MUST NOT send new
REQUEST messages and SHOULD NOT expect answers to any outstanding ones."
NEW
" When peer B receives a CHOKE message from A it MUST NOT send new
REQUEST messages and it cannot expect answers to any outstanding ones,
as the transfer of chunks is choked."
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 3.10.2
This whole section about PEX hole punching reads very, very
experimental. The STUN method is ok, but PEX isn't.
First of all, the safe behavior for a peer when it receives unsolicited
PEX messages, is to discard those messages. Second, this unsolicited PEX
messages trigger some behavior which may open an attack vector.
The best way, but this needs more discussion, is to include to some
token in the messages that are exchanged in order to make avoid any
blind attacks here. However, this will need more and detailed
discussions of the purpose of this.
<list style="symbols">
<t>
We moved parts of the security analysis of PEX up, such that all mechanisms
are explained in the main text, and the analysis of what attacks there are and
how these mechanisms prevent them is in the Sec. Considerations section.
</t>
<t>
The section about hole punching was removed, lacking a reference to the
experiments we conducted with this exact variant of the mechanism.
</t>
</list>
</t>
<t>
- Section 3.11
I don't see the 'MUST send keep-alive' as a mandatory requirement, as
peers might have good reasons not to send any keep alive. Why not saying
'A peer can send a keep-alive' and it 'MUST use the simple datagram...'
as already described. Though there is also no really need to say MUST.
<list style="symbols">
<t>
Now Section 3.12. Rephrased and clarified the reason and consequences of sending
keep-alive msgs.
</t>
</list>
</t>
<t>
- Section 4
The syntax definition for each of the chunk addressing schemes is
missing. This is not suitable for any specification that aims at
interoperable implementations.
<list style="symbols">
<t>
We added the usual bit-based ASCII art representations.
</t>
</list>
</t>
<t>
- Section 4.3.2
PPSPP peers MUST use the ACK message if an unreliable transport protocol
is used.
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 4.4
Has been tested in an implementation?
I would like to understand the need for such a section, as in my
understanding a peer implementation should chose one scheme and support
this and there shouldn't be the need to convert between the different
schemes.
<list style="symbols">
<t>
Yes, the reference implementation translates from chunk ranges on the wire
to bins internally. However, for simplicity we now state that all peers
in a swarm MUST use the same method and the compatibility section has
been removed.
</t>
</list>
</t>
<t>
- Section 5
This reads that MHTs are mandatory to implement while the document makes
the impression that MHTs are optional.
<list style="symbols">
<t>
Rephrased, see High-level issues.
</t>
</list>
</t>
<t>
- Section 5.3
" so each datagram SHOULD be processed separately and a loss of one
datagram MUST NOT disrupt the flow"
The MUST NOT is not a protocol specification requirement, but more an
informative part saying that a lost message shouldn't impact the
protocol machinery, but it can impact the overall operation.
What is the flow here in that sentence?
<list style="symbols">
<t>
Rephrased.
</t>
</list>
</t>
<t>
- Section 5.6.2.
An illustrative example explaining how the automatic size detection
works is required here.
<list style="symbols">
<t>
Added a paragraph with an example that follows the figure used during the
explanation. A state diagram could also be added, but might be a bit redundant.
</t>
</list>
</t>
<t>
- Section 6.1, 4th paragraph:
Where do I find the 1 byte algorithm field in the swarm ID? The swarm ID
is not really defined in a single place.
<list style="symbols">
<t>
Expanded. Added a formal definition.
</t>
</list>
</t>
<t>
- Section 7.3
The described min/max versioning relies on the fact that there are major
and minor version numbers. I cannot find any major and minor version
number scheme in the draft.
<list style="symbols">
<t>
Actually, it does not. There is a single unstructured version number.
</t>
</list>
</t>
<t>
- Section 7.4, Length field
It is not clear what the 'Length' field is referring to.
Further, it is not clear of the swam IDs are concatenated in one swarm
ID option, of each swarm ID must be placed in a separate swam ID option.
<list style="symbols">
<t>
Clarified.
</t>
</list>
</t>
<t>
- Section 7.6
MHTs are mandatory to support though MHTs are optional?
<list style="symbols">
<t>
Clarified.
</t>
</list>
</t>
<t>
- Section 7.7
'key size ... derived from the swarm ID'. This relates to my high level
comment no 4. on the use of implicit information. Either it is clearly
specified how this information is derived or there is a protocol
field/information about the size.
<list style="symbols">
<t>
Key size derivation procedure added to description of SIGNED_INTEGRITY
in UDP encapsulation.
</t>
</list>
</t>
<t>
- Section 7.8
I would recommend to say that the default MUST be supported, but the
peer must always signal what method it is supporting or at least using.
<list style="symbols">
<t>
Corrected, see High-level issues 4.)
</t>
</list>
</t>
<t>
- Section 7.10
I have not understood how the 'Lenght' field relates to the message
bitmap and how long the message bitmap can grow. The figure looks like a
maximum of 16 bits?
<list style="symbols">
<t>
Clarified.
</t>
</list>
</t>
<t>
- Section 8
I do not see the value of the text in the preface of Section 8. I would
say that this text should say what is mandatory and what's not, i.e.,
MUST use UDP and MUST use LEDBAT.
Potentially saying that future protocol versions can also run over other
transport protocols.
<list style="symbols">
<t>
Adjusted.
</t>
</list>
</t>
<t>
- Section 8.1 about Maximum Transfer Unit (MTU)
The text is discussing that a Ethernet can carry 1500 bytes. This is
true, but the Ethernet payload is not the normative MTU across all of
the Internet. For IPv6 the min MTU is 1280 bytes and for IPv4 it is 576
bytes, though for IPv4 it can be theoretically much lower at 64 bytes.
It would move the definition of the default chunk size to a
recommendation with text saying that this size has a high likelihood to
travel end-to-end in the Internet without any fragmentation.
Fragmentation might increase the loss of complete chunks, as one lost
fragment will cause the loss of a complete chunk.
One way of getting an informed decision on whether chunks can travel in
their size is to use the Don't Fragment (DF) bit in IPv4 and also to
watch for ICMP error messages. However, ICMP error messages are not a
reliable indication, but they can be some indication.
<list style="symbols">
<t>
1 KiB chunk size has been made a recommendation.
</t>
<t>
Added a small paragraph discussing the optional integration of MTU path discovery.
</t>
</list>
</t>
<t>
- Section 8.1 Definition of the default chunk size
There is no need to define a default chunk size, if the chunk size would
be always signaled per swarm. This is another default/implicit value
places that is unnecessary.
<list style="symbols">
<t>
The chunk size is always part of the content's metadata.
</t>
</list>
</t>
<t>
- Section 8.3: see also my comment no 3.
The concept of channels is introduced very late and with few words. A
figure to explain the concept will help a lot and also more formal text
on what a channel is and how they are identified. Also what the init
channel is.
<list style="symbols">
<t>
Concept now introduced in <xref target="sec_msgs_channels"/>.
</t>
</list>
</t>
<t>
- Section 8 in general:
There is no formal definition of the messages, just bit pattern examples.
<list style="symbols">
<t>
We added the usual bit-based ASCII art representations.
</t>
</list>
</t>
<t>
- Section 8.4 (as example for the other Sections in 8.x):
i) What is the '(CHANNEL' paramter? Is it actually a parameter?
ii) it is implicit that the first channel no (0000000) is the remote
peer's channel and that the second channel no (00000011) is the local
peer's channel, right?
This isn't clear from the text, but my guess.
<list style="symbols">
<t>
We added the usual bit-based ASCII art representations.
</t>
</list>
</t>
<t>
- Section 8.5
Can HAVE messages multiple bin specs in one message or do I have to make
a HAVE message for each bin?
<list style="symbols">
<t>
Clarified.
</t>
</list>
</t>
<t>
- Section 8.6
What is the formal defintion of a DATA message? That's completely
missing or I have not understood it.
<list style="symbols">
<t>
We added the usual bit-based ASCII art representations.
</t>
</list>
</t>
<t>
- Section 8.7
looks just underspecified, especially as this is the link to LEDBAT.
<list style="symbols">
<t>
Implementors will unfortunately need to read the full LEDBAT specification.
</t>
</list>
</t>
<t>
- Section 8.11
How are the chunks specified here? The formal syntax definition or
reference to one is missing.
<list>
<t>
We added the usual bit-based ASCII art representations.
</t>
</list>
</t>
<t>
- Section 8.13
I'm lost on this section, as I haven't fully understood the concept of
the PEX in this document. Especially not why there is the PEX_REScert.
<list style="symbols">
<t>
We moved parts of the security analysis of PEX up into 3.10, such that all
mechanisms are explained in the main text, and the analysis of what attacks
there are and how these mechanisms prevent them is in the Sec. Considerations
section.
</t>
</list>
</t>
<t>
- Section 11
The RFC required for protocol extensions of a standards track protocol
looks odd. This must be at least IETF Review or Standards Action.
<list style="symbols">
<t>
Policy changed to "IETF Review" and the section was extended with information
about data types and required information.
</t>
</list>
</t>
<t>
***Editorials:
</t>
<t>
- Abstract (and probably also other places), 1st sentence of,
PPSPP is not a transport protocol, just a protocol
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 1.1, 4th paragraph:
I would remove the reference to rmcat, as it is not yet clear what the
outcome of the rmcat wg will be
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 1.3, on page 8, about seeding/leeching:
I would break it in to sub-bullets.
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 2.1 and following:
These are examples, isn'it? If so, this should be mentioned or clarified.
<list style="symbols">
<t>
DONE. All subsections now labeled "Example:".
</t>
</list>
</t>
<t>
- Section 2.1: What is the PPSP Url?
<list style="symbols">
<t>
Reformulated in terms of "Imagine there is a PPSP URL".
</t>
</list>
</t>
<t>
- Section 2.3, the 1st paragraph, detection of dead peers:
It would be good to say where this detection is described in the
remainder of the draft. Just for completeness.
<list style="symbols">
<t>
DONE. Dead peer detection is now a separate section and referenced here.
</t>
</list>
</t>
<t>
- Section 2.2, the very last paragraph, 'Peer A MAY also':
This 'MAY' is not useful here. I would just write 'Peer A can also', as
there is nothing normative described here.
<list style="symbols">
<t>
DONE.
</t>
</list>
</t>
<t>
- Section 3.2, last paragraph:
What is the latter confinement? This is not clear to me.
<list style="symbols">
<t>
Rephrased.
</t>
</list>
</t>
<t>
- Section 3.9, last sentence
I am not sure to what the reference to Section 3.7 is pointing in this
respect.
<list style="symbols">
<t>
Rephrased.
</t>
</list>
</t>
<t>
- Section 3.10.1 about PEX messages
The text says 'PPSPP optionally features...'. I have not understood if
this optionally refers to mandatory to implement but optionally to use,
or if the PEX messages are optionally to implement.
<list style="symbols">
<t>
Made it clear that is OPTIONAL and not mandatory-to-implement.
</t>
</list>
</t>
<t>
- Section 3.12
I'm not sure what this section is telling exactly. Isn't just saying
that PPSPP as such does not care how chunks are stored locally, as this
is implementation dependent?
<list style="symbols">
<t>
Yes. Removed.
</t>
</list>
</t>
<t>
- Section 4.2, page 15, 1st paragraph:
OLD
'A PPSPP peer MAY support'
NEW
'The support for this scheme is OPTIONAL'
<list style="symbols">
<t>
DONE, for byte ranges as well.
</t>
</list>
</t>
<t>
- Section 6.1.1
This section is not describing sign-all, but rather a justification why
it may still work. This doesn't help at all.
</t>
<t>
- Section 7, 1st paragraph
Why is there a reference to RFC 2132?
<list style="symbols">
<t>
Removed, just similarity in format.
</t>
</list>
</t>
<t>
- Section 7 in general
i) It is common to give bit positions in the figures where the syntax of
options is described. This allows to count how many bits are used for a
protocol field more easily and also way more reliable.
ii) Please add also Figure labels to the syntax definitions of the
options. This makes it easier to reference them later on if needed.
</t>
<t>
- Section 8.1
1 kibibyte is 1 kbyte?
<list style="symbols">
<t>
Mentioned base 1024 in Terminology. Changed to 1024 bytes where appropriate.
</t>
</list>
</t>
<t>
- Section 8.2, last paragraph
i ) "All messages are idempotent" in what respect?
ii) "or recognizable as duplicates" but how are the recognized as
duplicates?
<list style="symbols">
<t>
Idempotent means that processing a message twice does not lead to a different
state than processing them once. Resent handshakes can be recognized as
duplicates because a peer already recorded the first connection attempt in its
state. Updated text.
</t>
</list>
</t>
<t>
- Section 8.5, last sentence in brackets:
What is this last sentence about?
<list style="symbols">
<t>
Was explanation of the on-the-wire bytes shown.
</t>
</list>
</t>
<t>
- Section 8.13
" If sender of
the PEX_REQ message does not have a private or link-local address,
then the PEX_RES* messages MUST NOT contain such addresses
[RFC1918][RFC4291]."
What is this text saying? Do not include what you do not have anyway?
<list style="symbols">
<t>
Rephrased. It tries to say that internal addresses must not be leaked to
external peers.
</t>
</list>
</t>
<t>
- Section 8.14
There is no single place where all the constants are collected and also
documented what the default values or the recommended values. For
instance in this Section 8.14 where the dead peer time out is set to 3
minutes and also the number of datagrams that should have sent. I would
make a section or subsection to discuss dead peers and how they are
detected and just link to the keep-alive mechanism in Section 8.14.
<list style="symbols">
<t>
The <xref target="sec_mgmt_op_config"/> section was rewritten for this in
the Ops & Mgmt part.
</t>
</list>
</t>
<t>
- Section 11
This section needs to be overhauled once the document is ready for the
IESG. The section is not wrong but can be improved to help IANA.
<list style="symbols">
<t>
The section was extended with information about data types and required
information.
</t>
</list>
</t>
</list>
</t>
<t hangText="-08 2013-08-8 Continued Revision following AD Review">
<list style="empty">
<t>
Please see the -07 entry for our responses to the comments.
</t>
<t>
Added ECDSAP256SHA256 and ECDSAP384SHA384 as mandatory-to-implement live
signature algorithms, as they provide small swarm IDs.
</t>
<t>
Added line that a peer SHOULD NOT send HAVEs to peers that already have the
complete content (e.g. in video-on-demand scenarios).
</t>
<t>
In response to a remark at WG meeting at IETF 87 we added a paragraph
on OPTIONAL MTU discovery using PPSPP messages to
<xref target="sec_encap_udp_chunksize"/>.
</t>
</list>
</t>
<!-- End of per-revision list -->
</list>
</t>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 08:15:52 |