One document matched: draft-deng-ppsp-bfbitmap-01.xml
<?xml version="1.0" encoding="UTF-8"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
There has to be one entity for each item to be referenced.
An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2234 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2234.xml">
<!ENTITY I-D.narten-iana-considerations-rfc2434bis SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.narten-iana-considerations-rfc2434bis.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs),
please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
(Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space
(using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="yes" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-deng-ppsp-bfbitmap-01.txt" ipr="trust200902">
<!-- ***** FRONT MATTER ***** -->
<front>
<!-- The abbreviated title is used in the page header - it is only necessary if the
full title is longer than 39 characters -->
<title>Efficient Chunk Availability Compression for PPSP
</title>
<!-- add 'role="editor"' below for the editors if appropriate -->
<!-- Another author who claims to be an editor -->
<author fullname="Lingli Deng" initials='L.L.'
surname="Deng">
<organization>China Mobile</organization>
<address>
<email>denglingli@chinamobile.com</email>
</address>
</author>
<author fullname="Jin Peng" initials='J.'
surname="Peng">
<organization>China Mobile</organization>
<address>
<email>pengjin@chinamobile.com</email>
</address>
</author>
<author fullname="Yunfei Zhang" initials='Y.F.'
surname="Zhang">
<organization>CoolPad</organization>
<address>
<email>hishigh@gmail.com</email>
</address>
</author>
<date day="14" month="July" year="2013" />
<!-- Meta-data Declarations -->
<area></area>
<workgroup>PPSP</workgroup>
<keyword></keyword>
<abstract>
<t>This draft proposes to employ bloom filters in compressing chunk availability information, which is periodically
exchanged between peers and the tracker through both the PPSP-TP protocol and PPSPP protocol, so as to reduce
relevant cost (in tranmission, storage and computation) and enhance the overall system's scalability. </t>
</abstract>
</front>
<middle>
<section anchor="intro" title="Introduction">
<t>As it is pointed out by <xref target="I-D.ietf-ppsp-problem-statement"/>, current P2P streaming practices often use
a "bitmap" message in order to exchange chunk availability. The message is of kilobytes in size and exchanged
frequently, e.g., an interval of several seconds or less. </t>
<t>To begin with, in a mobile environment with scarce bandwidth, the message size may need to be shortened or it may
require more efficient methods for expressing and distributing chunk availability information, which is different
from wire-line P2P streaming. </t>
<t>Even in a wire-line P2P streaming application, frequent exchange of large volume of bitmap information, is among the
key factors that set a limit to the system's efficiency and scalability <xref target="P2P-limit"/>. </t>
<t>Therefore, the following requirements for PPSP protocols in terms of chunk availability exchange are stated in
<xref target="I-D.ietf-ppsp-problem-statement"/> :
<list style="empty">
<t> PPSP.TP.REQ-3: The tracker protocol MUST take the frequency of messages and efficient use of bandwidth into consideration,
when communicating chunk availability information.</t>
<t> PPSP.PP.REQ-7: The peer protocol MUST take the frequency of messages and efficient use of bandwidth into consideration,
when communicating chunk information.</t>
</list>
</t>
<t> To this end, we propose to employ bloom filter algorithms in compressing chunk availability information, which is exchanged
frequently between peers and the tracker through the PPSP-TP protocol and PPSPP protocol. Given the Bloom Filters' wide
adoptation in Internet and demonstrated efficiency with highly compacted data structure and low complexity and cost in
terms of information storage, transportation and computation, it is expected to relieve a PPSP implementer from the dilemma
between "the frequency of messages" (i.e. the timely exchange of information that contributes to better user experience)
and "efficient use of bandwidth" (i.e. the limit of a single node/peer that holds the system's overall scalability by throat).
</t>
</section>
<section title=" Background on Bloom Filter">
<t>Bloom Filter (or BF for short) was first introduced in 1970s <xref target="BF-bloom"/>, which makes use of multiple
hashing functions to build a mapping from a set of elements to a compact binary array, to realize highly efficient member
queries with a tolerably low error rate of wrongly reported hits. Despite their extraordinary efficiency in terms of
storage reduction and query acceleration, BFs suffer from the fact that there is possibility that a non-member
of the set be wrongly taken as a member after the query. However, research <xref target="BF-analysis"/> shows that the odds that
a BF-based membership query makes an erroneous hit can be suppressed to near zero, by a tactful configuration of various
system parameters, including the hash functions used, the number of hash functions to be used, and the length of the bit
array. </t>
<figure align="center" anchor="figure_BF" title="Basic algorithms for BF-bitmap">
<artwork align="left">
<![CDATA[
------------------------------------------------
BF(set S, integer m, hash set H)
1 filter=allocate m bits initialized to 0;
2 for each element xi in S do
3 for each hash functions hi in H do
4 filter[hi(xi)]=1;
5 return filter;
-------------------------------------------------
MT(element elm, BF filter, integar m, hash set H)
1 for each hash functions hi in H do
2 if (filter[hi(elm)]!=1)
3 return false;
4 return true;
-------------------------------------------------
]]>
</artwork>
<postamble></postamble>
</figure>
<t>As shown in <xref target="figure_BF"/>, the BF(S,m) algorithm takes a n-membered sub-set S={x1,x2,...,xn} from a
universal set U as input, and outputs a m-bit binary array B as a compacted representation of S. In order to do that,
it makes use of k independent random hash functions, each of which maps a member to a marked bit in B
(i.e hj: U-> [1,m], j=1...k). The BF algorithm is highly efficient in the following aspects:
<list style="symbols">
<t> It is quite simple and straightforward to generate the BF representation of a set S, B=BF(S): initially, all the bits in B
is set to 0; then, for each member x of the set S, mark each bit in B, to which a hash function maps x (as shown in
Figure 1 as the BF algorithm). </t>
<t> It is highly efficient to check whether or not a given element x is in any BF-represented set B=BF(S): for each hash function
hj, check the value of B[hj(x)] against 1. It is always safe to exclude the element x out of set S, once there is a zero-valued
hash bit, otherwise it is assumed that x is a member of S (the MT algorithm in Figure 1). </t>
</list>
</t>
<t>For instance, given a 2GB movie file, the original bitmap for a sharing peer would be 1024-bit (if the system is using 2MB-sized segments).
By simply using 4 uniform random hash functions and a 128-bit BF-bitmap, the possibility of erroneous hits by MT algorithm would be lower
than 3%. </t>
<t>As for a simple illustration, the 4 hash functions may be established through the MD5 message-digest algorithm <xref target="RFC1321"/>, a widely used cryptographic
hash function that produces a 128-bit (16-byte) hash value from an arbitrary binary input. MD5 has been utilized in a wide variety of security
applications, and is also commonly used to check data integrity. </t>
<t>Specifically, the processing of 4 hash functions is as follows: use the MD5 algorithm to turn a given chunk_ID into a 128-bit binary array, further
separate the 128 bits into 4 arrays (32-bit each), and finally divide each of them using 128 to yield 4 integers in the range of [1,m]. </t>
</section>
<section title="BF-based Chunk Availability Exchange">
<t>We first construct a general message flow (shown in <xref target="figure_protocols"></xref>) from PPSP protocols,
and then discuss how to integrate BF-bitmap algorithm with it.</t>
<section title="A non-BF PPSP session">
<figure align="center" anchor="figure_protocols" title="A typical PPSP session for watching a streaming content.">
<artwork align="left">
<![CDATA[
+--------+ +--------+ +--------+ +--------+ +-------+
| Player | | Peer 1 | | Portal | | Tracker| | Peer 2|
+---+----+ +----+---+ +-----+--+ +----+---+ +----+--+
| | | | |
|--Page request----------------->| | |
|<--------------Page with links--| | |
|--Select stream (MPD Request)-->| | |
|<--------------------OK+MPD(x)--| | |
|--Start/Resume->|--CONNECT(join x)------------>| |
|<-----------OK--|<----------------OK+Peerlist--| |
: : : : :
| |<------------ HAVE(BF(S2))----------------|
|-Get(Chunk s1)->| | | |
| |------------- REQUEST(BF(s1))------------>|
|<-----Chunk s1--|<-------------------------DATA(Chunk s1)--|
: : : : :
| |--STAT_REPORT---------------->| |
| |<-------------------------Ok--| |
: : : : :
| |--FIND(Chunk subset)--------->| |
| |<-------------OK+PeerList-----| |
: : : : :
]]>
</artwork>
<postamble></postamble>
</figure>
<t>When a peer wants to receive streaming of a selected content (Leech mode):
<list style="numbers">
<t>Peer connects to a connection tracker (which may be located through a web portal) and joins a swarm.</t>
<t>Peer acquires a list of other peers in the swarm from the connection tracker (via the tracker protoco)
through the CONNECT message.</t>
<t>Peer exchanges its content availability with the peers on the obtained peer list (via peer protocol)
through the HAVE message.</t>
<t>Peer requests content from the identified peers (via peer protocol) through the REQEST-DATA messages.</t>
<t>Peer periodically reports its status and chunk avaiablitity with the tracker (via the tracker protocol)
through the STAT_REPORT message.</t>
<t>Peer acquires a list of other peers for a specific subset of media chunks in the swarm from the connection
tracer (via the tracker protocol) through the FIND message.</t>
</list>
</t>
</section>
<section title="A PPSP Session with BF-bitmaps">
<t>This document proposes to employ bloom filter algorithms in compressing chunk availability information
exchanged and stored between peers and the tracker through the PPSP-TP protocols and PPSPP protocol.
Relevent extensions to the current protocols are summarized as follows: (as shown in <xref target="figure_extensions"></xref>)</t>
<figure align="center" anchor="figure_extensions" title="A typical PPSP session with BF-bitmaps.">
<artwork align="left">
<![CDATA[
+--------+ +--------+ +--------+ +--------+ +-------+
| Player | | Peer 1 | | Portal | | Tracker| | Peer 2|
+--------+ +--------+ +--------+ +--------+ +-------+
| | | | |
(a1) |--Page request----------------->| | |
|<----Page with links(+BF conf)--| | |
|--Select stream (MPD Request)-->| | |
|<----------OK+MPD(x)(+BF conf)--| | |
(a2) |--Start/Resume->|--CONNECT(join x)------------>| |
(a3) |<-----------OK--|<--OK(+BF conf)+Peerlist(BF)--| |
| | | | |
: : : : :
(c1) | |<------------ HAVE(BF(S2))----------------|
|-Get(Chunk s1)->| | | |
(c2) | |------------- REQUEST(BF(s1))------------>|
|<-----Chunk s1--|<-------------------------DATA(Chunk s1)--|
: : : : :
(b1) | |-STAT_REPORT(BF(ContentMap))->| |
| |<-------------------------Ok--| |
: : : : :
(b2) | |--FIND(Chunk subset S')------>| |
(b3) | |<---------OK+PeerList(BF)-----| |
: : : : :
]]>
</artwork>
<postamble></postamble>
</figure>
<t>
<list style="letters">
<t>Integration to the base TP protocol <xref target="I-D.ietf-ppsp-base-tracker-protocol"></xref>:
<list style="symbols">
<t>(a1-a2)Configuration Setup: m, The length of the output bit array and H, the hash functions in use, are
system level parameters that should be configured globally. There are two ways in achiveing this:
(a1) the BF configurations (or BF conf for short) be stored at the web portal and published
to a requesting peer through the web page or MPD file transaction; or (a2) the BF conf be stored at
the tracker and published to a joining peer through the PPSP TP protocols. </t>
<t>(a3)In response to a JOIN request from a peer, the tracker may accompany the returned peer list with each
recommended peer's BF-formed chunk availability bitmap, as the initial guidance for the requestor
to start looking for neighbors in the same swarm. The additional cost for bearing the chunk-level
availability information is constant (O(m)) for each peer in the returned peer list. </t>
</list>
</t>
<t>Integration to the extended TP protocol <xref target="I-D.ietf-huang-extended-tracker-protocol"></xref>:
<list style="symbols">
<t>(b1)STAT_REPORT: Peers use the BF(S,m,H) algorithm for compressing the subset of locally stored and integrity verified
chunks (set S) in terms of a given swarm-ID, whenever reporting or updating its chunk availability
information with the tracker. As the length of each BF-bitmap is constant (O(m)), this will greatly
reduce the tracker's resouce expenditure in communicaing and storing such information for a large peer
population. </t>
<t>(b2-b3)FIND: Peers use the BF(S,m,H) algorithm for indicating its query intention in the FIND request for a specific
chunk sub set S' of the given swarm to the tracker or the other peer. The additional cost for bearing
the chunk set is constant (O(m)). In response to a FIND request with specific chunk subset S' in need from a
peer, the tracker performs the subset containment check on the query set parameter BF(S') against each registered
peer's chunk availability BF(S) by three simple binary operations to decided whether or not to include the peer
into the peer list in return: check if "F(S) equals (BF(S) bitwise OR (BF(S) bitwise XOR BF(S'))" holds. The
computational cost for each subset check is constant (O(m)).</t>
</list>
</t>
<t>Integration to the peer protocol <xref target="I-D.ietf-ietf-ppsp-peer-protocol"></xref>:
<list style="symbols">
<t>(c1)HAVE: Peers use the BF(S,m,H) algorithm for compressing the subset of locally stored and integrity verified chunks
(e.g. set S2 for Peer 2 in <xref target="figure_extensions"></xref>) in terms of a given swarm-ID, whenever sharing
its chunk availability information with another peer. The length of each BF-bitmap is constant (O(m)).</t>
<t>(c2)REQUEST: For a downloading peer to decide which neighbor to request for a given chunk_ID s, it uses the member query
algorithm MT(s,bf,m,H) on each neighbor's BF-bitmap bf. The computation cost for this member check is constant (O(m)).
It is also optional for a requesting peer to use BF-bitmap to indicate its data request to another peer, if needed.</t>
</list>
</t>
</list>
</t>
</section>
</section>
<section title="Open Issues">
<section title="Algorithm Configuration Setup">
<t>As stated earlier, the BF scheme is based on a mutual arrangement between the information requestor and the responder of the basic settings
for the hash algorithms (both the number of them and the specific ones) in use and the coded bitmap's binary length. In other words,
there MUST be a way of configuration setup mechanism in a local system.</t>
<t>To serve as the input for further discussion, we provide two initial proposals here:
<list style="symbols">
<t>Option1: Centralized Server for Uniform Configuration: The most simple and straightforward way would be to set up a logically
centralized configuration server to do the trick. For instance, the RELOAD base protocol introduces such a configuration
server to synchronize the hash function for the P2P DHT before a peer/client joins in the overlay [I-D.ietf-p2psip-base].
There are two potential ways to integrate into the base TP protocol's enrollment and bootstrap process: The Publishing
and Searching Portal could serve as a configuration server and return the BF configuration information to the peer through
player, either
<list style="symbols">
<t>via the page returned in response to a web page request, or</t>
<t>via the MPD(Media Presentation Description) file in response to a MPD request.</t>
</list>
</t>
<t>Option2: Configuration Exchange as Joining in a SWARM: In view of the interworking usage of PPSP as a generally accepted suite
of protocols to bridging different P2P systems, who may differ in specific choice of hash functions and other parameters,
there SHOULD be a way of parameter negotiation mechanism across different systems. Negotiation may also introduce flexibility
in a single system. E.g. large files or mobile peers may prefer more compact way of exchanging this information. Therefore,
the tracker could include a swarm-specific BF configuration parameters into the OK response to the JOIN request from a new-coming
peer (as labeled by (a3) in <xref target="figure_extensions"></xref>).</t>
</list>
</t>
</section>
</section>
<section title="Security Considerations">
<t>TBA</t>
</section>
<section title="IANA Considerations">
<t>None.</t>
</section>
<!-- This PI places the pagebreak correctly (before the section title) in the text output. -->
<?rfc needLines="8" ?>
</middle>
<!-- *****BACK MATTER ***** -->
<back>
<!-- References split into informative and normative -->
<!-- There are 2 ways to insert reference entries from the citation libraries:
1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
(for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")
Both are cited textually in the same manner: by using xref elements.
If you use the PI option, xml2rfc will, by default, try to find included files in the same
directory as the including file. You can also define the XML_LIBRARY environment variable
with a value containing a set of directories to search. These can be either in the local
filing system or remote ones accessed by http (http://domain/dir/... ).-->
<references title="Normative References">
<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->
&RFC2119;
&RFC2234;
</references>
<references title="Informative References">
<!-- A reference written by by an organization not a person. -->
<reference anchor="I-D.ietf-ppsp-problem-statement">
<front>
<title>Problem Statement and Requirements of Peer-to-Peer Streaming Protocol (PPSP)</title>
<author initials="Y." surname="Zhang">
<organization></organization>
</author>
<author initials="N." surname="Zong">
<organization></organization>
</author>
<date month="January" year="2013" />
</front>
<seriesInfo name="draft-ietf-ppsp-problem-statement-12" value="(work in progress)" />
</reference>
<reference anchor="P2P-limit">
<front>
<title>Understanding the performance gap between pull-based mesh streaming protocols and fundamental limits</title>
<author initials="C." surname="Feng">
<organization></organization>
</author>
<author initials="B." surname="Li">
<organization></organization>
</author>
<author initials="B." surname="Li">
<organization></organization>
</author>
<date month="" year="2009" />
</front>
<seriesInfo name="in Proc. of IEEE INFOCOM" value="" />
</reference>
<reference anchor="I-D.ietf-ppsp-base-tracker-protocol" >
<front>
<title>PPSP Tracker Protocol-Base Protocol (PPSP-TP/1.0)</title>
<author initials="R." surname="Cruz">
<organization></organization>
</author>
<author initials="M." surname="Nunes">
<organization></organization>
</author>
<author initials="Y." surname="Gu">
<organization></organization>
</author>
<author initials="J." surname="Xia">
<organization></organization>
</author>
<author initials="J." surname="Taveira">
<organization></organization>
</author>
<date month="February" year="2013" />
</front>
<seriesInfo name="draft-ietf-ppsp-base-tracker-protocol-00" value="(work in progress)" />
</reference>
<reference anchor="I-D.ietf-huang-extended-tracker-protocol">
<front>
<title>PPSP Tracker Protocol-Extended Protocol</title>
<author initials="R." surname="Huang">
<organization></organization>
</author>
<author initials="N." surname="Zong">
<organization></organization>
</author>
<author initials="R." surname="Cruz">
<organization></organization>
</author>
<author initials="" surname="Nunes">
<organization></organization>
</author>
<author initials="J." surname="Taveira">
<organization></organization>
</author>
<date month="February" year="2013" />
</front>
<seriesInfo name="draft-ietf-huang-extended-tracker-protocol-02 " value="(work in progress)" />
</reference>
<reference anchor="I-D.ietf-ietf-ppsp-peer-protocol">
<front>
<title>Peer-to-Peer Streaming Peer Protocol (PPSPP)</title>
<author initials="A." surname="Bakker">
<organization></organization>
</author>
<author initials="R." surname="Petrocco">
<organization></organization>
</author>
<author initials="V." surname="Grishchenko">
<organization></organization>
</author>
<date month="February" year="2013" />
</front>
<seriesInfo name="draft-ietf-ppsp-peer-protocol-06" value="(work in progress)" />
</reference>
<reference anchor="BF-bloom" >
<front>
<title>Space/time trade-offs in hash coding with allowable errors.</title>
<author initials="B.H." surname="Bloom">
<organization></organization>
</author>
<date month="" year="1970" />
</front>
<seriesInfo name="Communications of ACM" value="Vol. 13, No. 7, pp. 422-426" />
</reference>
<reference anchor="BF-analysis" >
<front>
<title>Network applications of Bloom Filters: a survey</title>
<author initials="A." surname="Broder">
<organization></organization>
</author>
<author initials="M." surname="Mitzenmacher">
<organization></organization>
</author>
<date month="" year="2004" />
</front>
<seriesInfo name="Internet Mathematics" value="Vol. 1, No. 4, pp. 485¨C509" />
</reference>
<reference anchor="RFC1321">
<front>
<title>RFC 1321: The MD5 message-digest algorithm</title>
<author initials="" surname="Rivest">
<organization></organization>
</author>
<author initials="" surname="Newport">
<organization></organization>
</author>
<date month="April" year="1992" />
</front>
<seriesInfo name="draft-ietf-p2psip-base-26" value="(work in progress)" />
</reference>
<reference anchor="I-D.ietf-p2psip-base">
<front>
<title>REsource LOcation And Discovery (RELOAD) Base Protocol</title>
<author initials="C." surname="Jennings">
</author>
<author initials="B." surname="Lowekamp">
</author>
<author initials="E." surname="Rescorla">
</author>
<author initials="S." surname="Baset">
</author>
<author initials="H." surname="Schulzrinne">
</author>
<date month="February" year="2013" />
</front>
</reference>
</references>
</back>
<!-- Here we use entities that we defined at the beginning. -->
<!-- Change Log
v00 2006-03-15 EBD Initial version
v01 2006-04-03 EBD Moved PI location back to position 1 -
v3.1 of XMLmind is better with them at this location.
v02 2007-03-07 AH removed extraneous nested_list attribute,
other minor corrections
v03 2007-03-09 EBD Added comments on null IANA sections and fixed heading capitalization.
Modified comments around figure to reflect non-implementation of
figure indent control. Put in reference using anchor="DOMINATION".
Fixed up the date specification comments to reflect current truth.
v04 2007-03-09 AH Major changes: shortened discussion of PIs,
added discussion of rfc include.
v05 2007-03-10 EBD Added preamble to C program example to tell about ABNF and alternative
images. Removed meta-characters from comments (causes problems). -->
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 04:05:18 |