One document matched: draft-irtf-samrg-sam-baseline-protocol-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
     <!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-irtf-samrg-sam-baseline-protocol-00"
     ipr="trust200902" consensus="yes" submissionType="IRTF">

  <!-- ***** FRONT MATTER ***** -->

  <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
         full title is longer than 39 characters -->

    <title abbrev="ALM Extensions to RELOAD">Application Layer Multicast
    Extensions to RELOAD</title>

    <!-- add 'role="editor"' below for the editors if appropriate -->

    <author fullname="John Buford" initials="J.F." surname="Buford">
      <organization>Avaya Labs Research</organization>

      <address>
        <postal>
          <street>211 Mt. Airy Rd</street>
          <city>Basking Ridge</city>
          <region>New Jersey</region>
          <code>07920</code>
          <country>USA</country>
        </postal>

        <phone>+1 908 848 5675</phone>

        <email>buford@avaya.com</email>

        <!-- uri and facsimile elements may also be added -->
      </address>
    </author>

    <author fullname="Mario Kolberg" initials="M." role="editor"
            surname="Kolberg">
      <organization>University of Stirling</organization>

      <address>
        <postal>
          <street>Dept. Computing Science and Mathematics</street>
          <city>Stirling</city>
          <region></region>
          <code>FK9 4LA</code>
          <country>UK</country>
        </postal>

        <phone>+44 1786 46 7440</phone>
        <email>mkolberg@ieee.org</email>
        <uri>http://www.cs.stir.ac.uk/~mko</uri>

        <!-- uri and facsimile elements may also be added -->
      </address>
    </author>


    <date day="05" month="August" year="2012" />

    <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
         in the current day for you. If only the current year is specified, xml2rfc will fill 
	 in the current day and month for you. If the year is not the current one, it is 
	 necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
	 purpose of calculating the expiry date).  With drafts it is normally sufficient to 
	 specify just the year. -->

    <!-- Meta-data Declarations -->

    <area>IRTF</area>

    <workgroup>SAM Research Group</workgroup>

    <!-- WG name at the upperleft corner of the doc,
         IETF is fine for individual submissions.  
	 If this element is not present, the default is "Network Working Group",
         which is used by the RFC Editor as a nod to the history of the IETF. -->

    <keyword>application layer multicast</keyword>

    <!-- Keywords will be incorporated into HTML output
         files in a meta tag but they have no effect on text or nroff
         output. If you submit your draft to the RFC Editor, the
         keywords will be used for the search engine. -->

    <abstract>
      <t>
      We define a RELOAD Usage for Application Layer Multicast as well as 
      extensions to RELOAD message layer to support ALM.
      The ALM Usage is intended to support a variety of ALM control algorithms
      in an overlay-independent way.
      Scribe is defined as an example algorithm.
      </t>
    </abstract>

 </front>

  <middle>
    <section title="Introduction">
      <t>The concept of scalable adaptive multicast includes both scaling
      properties and adaptability properties. Scalability is intended to
      cover: <list style="symbols">
          <t>large group size</t>

          <t>large numbers of small groups</t>

          <t>rate of group membership change</t>

          <t>admission control for QoS</t>

          <t>use with network layer QoS mechanisms</t>

          <t>varying degrees of reliability</t>

          <t>trees connect nodes over global internet</t>
        </list> Adaptability includes <list style="symbols">
          <t>use of different control mechanisms for different multicast trees
          depending on initial application parameters or application class</t>

          <t>changing multicast tree structure depending on changes in
          application requirements, network conditions, and membership</t>


        </list> </t>

        <t>Application Layer Multicast (ALM) has been demonstrated to be a viable
        multicast technology where native multicast isn't available.  
        Many ALM designs have been proposed.  This ALM Usage focuses on:

        <list style="symbols">
          <t>ALM implemented in RELOAD-based overlays </t>
          <t>Support for a variety of ALM control algorithms </t>
          <t>Providing a basis for defining a separate hybrid-ALM RELOAD Usage </t>
        </list>

        RELOAD <xref target="I-D.ietf-p2psip-base"></xref> has an 
        application extension mechanism in which a new type of application defines a Usage.
        A RELOAD Usage defines a set of data types and rules for their use.
        In addition, this document describes additional message types and a new ALM
        algorithm plugin architectural component.</t>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
      </section>

    </section>

    <section anchor="definitions" title="Definitions">

      <t>We adopt the terminology defined in section 2 of <xref target="I-D.ietf-p2psip-base"></xref>,
      specifically the distinction between Node, Peer, and Client.</t>

      <section title="Overlay Network">
        <figure align="center" anchor="overlay">
          <artwork align="left"><![CDATA[
                    P    P    P   P     P
                  ..+....+....+...+.....+...
                 .                          +P
               P+                            .
                 .                          +P
                  ..+....+....+...+.....+...
                    P    P    P   P     P
            ]]></artwork>
        </figure>

        <t>Overlay network - An application layer virtual or logical network
        in which end points are addressable and that provides connectivity,
        routing, and messaging between end points. Overlay networks are
        frequently used as a substrate for deploying new network services, or
        for providing a routing topology not available from the underlying
        physical network. Many peer-to-peer systems are overlay networks that
        run on top of the Internet. In the above figure, "P" indicates overlay
        peers, and peers are connected in a logical address space. The links
        shown in the figure represent predecessor/successor links. Depending
        on the overlay routing model, additional or different links may be
        present.</t>
      </section>

      <section title="Overlay Multicast">
        <t>Overlay Multicast (OM): Hosts participating in a multicast session
        form an overlay network and utilize unicast connections among pairs of
        hosts for data dissemination. The hosts in overlay multicast
        exclusively handle group management, routing, and tree construction,
        without any support from Internet routers. This is also commonly known
        as Application Layer Multicast (ALM) or End System Multicast (ESM). We
        call systems which use proxies connected in an overlay multicast
        backbone "proxied overlay multicast" or POM.</t>
      </section>

      <section title="Peer">
        <t>Peer: an autonomous end system that is connected to the physical
        network and participates in and contributes resources to overlay
        construction, routing and maintenance. Some peers may also perform
        additional roles such as connection relays, super nodes, NAT
        traversal, and data storage.</t>
      </section>


    </section>

    <section anchor="Assumptions" title="Assumptions">
      <section title="Overlay">
        <t>Peers connect in a large-scale overlay, which may be used for a
        variety of peer-to-peer applications in addition to multicast
        sessions. Peers may assume additional roles in the overlay beyond
        participation in the overlay and in multicast trees. We assume a
        single structured overlay routing algorithm is used. Any of a variety
        of multi-hop, one-hop, or variable-hop overlay algorithms could be
        used.</t>

        <t>Castro et al. <xref target="CASTRO2003"></xref>compared multi-hop
        overlays and found that tree-based construction in a single overlay
        out-performed using separate overlays for each multicast session. We
        use a single overlay rather than separate overlays per multicast
        sessions. </t>

        <t>An overlay multicast algorithm may leverage the overlay's mechanism
        for maintaining overlay state in the face of churn. For example, a
        peer may store a number of DHT (Distributed Hash Table) entries. When
        the peer gracefully leaves the overlay, it transfers those entries to
        the nearest peer. When another peer joins which is closer to some of
        the entries than the current peer which holds those entries, than
        those entries are migrated. Overlay churn affects multicast trees as
        well; remedies include automatic migration of the tree state and
        automatic re-join operations for dislocated children nodes.</t>
      </section>

      <section title="Overlay Multicast">
        <t>The overlay supports concurrent multiple multicast trees. The limit
        on number of concurrent trees depends on peer and network resources
        and is not an intrinsic property of the overlay. </t>
      </section>

      <section title="RELOAD">
        <t>We use RELOAD <xref target="I-D.ietf-p2psip-base"></xref> as the
        distibuted hash table (DHT) for data storage and overlay by which the
        peers interconnect and route messages. RELOAD is a generic P2P
        overlay, and application support is defined by profiles called Usages.
        </t>

      </section>

      <section title="NAT">
        <t>Some nodes in the overlay may be in a private address space and
        behind firewalls. We use the RELOAD mechanisms for NAT traversal. We
        permit clients to be leaf nodes in an ALM tree.</t>
      </section>

      <section title="Tree Topology">
        <t>All tree control messages are routed in the overlay.
        Two types of data or media topologies are envisioned:  1) tree edges are paths in the overlay,
        2) tree edges are direct connections between a parent and child peer in the tree,
        formed using the RELOAD AppAttach method.  
        </t>
      </section>
    </section>

    <section title="Architecture Extensions to RELOAD">
      <t>There are two changes, shown in the figure below.
      New ALM messages are added to RELOAD Message Transport.
      A plug-in for ALM algorithms handles the ALM state and control.
      The ALM Algorithm is under control of the application
      via the Group API <xref target="I-D.irtf-samrg-common-api"></xref>.
      </t>
        <figure align="center" anchor="ALMUsage">
          <artwork align="left"><![CDATA[
                                                 +---------+
                                                 |Group API|
                                                 +---------+
                                                      |
    ------------------- Application  ------------------------
        +-------+                                     |
        | ALM   |                                     |
        | Usage |                                     |
        +-------+                                     |
     -------------- Messaging Service Boundary --------------
                                                      |
       +--------+      +-----------+---------+    +---------+
       | Storage|<---> | RELOAD    | ALM     |<-->| ALM Alg |
       +--------+      | Message   | Messages|    +---------+
               ^       | Transport |         |
               |       +-----------+---------+
               v          |    |
              +-------------+  |
              | Topology    |  |
              | Plugin      |  |
              +-------------+  |
                 ^             |
                 v             v
              +-------------------+
              | Forwarding&       |
              | Link Management   |
              +-------------------+
    
     ---------- Overlay Link Service Boundary --------------

            ]]></artwork>
        </figure>


      <t>The ALM components interact with RELOAD as follows: 
         <list style="symbols">
          <t>ALM uses the RELOAD data storage functionality to
          store a ALMTree instance when a new ALM tree is created in the overlay, and
          to retrieve ALMTree instance(s) for existing ALM trees.</t>

          <t>ALM applications and management tools may use the RELOAD data storage
          functionality to store diagnostic information about the operation of
          tree, including average number of tree, delay from source to leaf
          nodes, bandwidth use, lost packet rate. In addition, diagnostic
          information may include statistics specific to the tree root, or to
          any node in the tree.</t>
        </list> 
       </t>
    </section>

    <section title="RELOAD ALM Usage">
      <t>Applications of RELOAD are restricted in the data types that be can
      stored in the DHT. The profile of accepted data types for an application
      is referred to as a Usage. RELOAD is designed so that new applications
      can easily define new Usages. New RELOAD Usages are needed for 
      multicast applications since the data types in base RELOAD and existing
      usages are not sufficient.</t>

      <t>We define an ALM Usage in RELOAD. This ALM
      Usage is sufficient for applications which require ALM
      functionality in the overlay. The figure below shows the internal structure
      of the ALM Usage. This contains the Group API (<xref
      target="I-D.irtf-samrg-common-api"></xref>)
      an ALM algorithm plugin (e.g. Scribe) and the ALM messages which are then 
      sent out to the RELOAD network.</t>

      <t>A RELOAD Usage is required <xref
      target="I-D.ietf-p2psip-base"></xref> to define the following: <list
          style="symbols">
          <t>Register Kind-Id points</t>

          <t>Define data structures for each kind</t>

          <t>Defines access control rules for each kind</t>

          <t>Defines the Resource Name used to hash to the Resource ID where
          the kind is stored</t>

          <t>Addresses restoration of values after recovery from a network
          partition</t>

          <t>Defines the types of connections that can be initiated using
          AppConnect</t>
        </list>
        </t>

           <t>A ALM GroupID is a RELOAD Node-ID. The owner of a ALM group creates
        a RELOAD Node-ID as specified in <xref
        target="I-D.ietf-p2psip-base"></xref>. This means that a GroupID is
        used as a RELOAD Destination for overlay routing purposes.</t>
   
    </section>

    <section title="ALM Tree Control Signaling">
      <t>Peers use the overlay to support ALM operations such as: 
        <list style="symbols">
          <t>Create tree</t>
          <t>Join</t>
          <t>Leave</t>
          <t>Re-Form or optimize tree</t>
        </list>
        There are a variety of algorithms for peers to form multicast
      trees in the overlay. We permit multiple such algorithms to be supported
      in the overlay, since different algorithms may be more suitable for
      certain application requirements, and since we wish to support
      experimentation. Therefore, overlay messaging corresponding to the set
      of overlay multicast operations must carry algorithm identification
      information.</t>

      <t>For example, for small groups, the join point might be directly
      assigned by the rendezvous point, while for large trees the join request
      might be propagated down the tree with candidate parents forwarding
      their position directly to the new node.</t>

      <t>Here is a simplistic algorithm for forming a multicast tree in the
        overlay. Its main advantage is use of the overlay routing mechanism
        for routing both control and data messages. The group creator doesn't
        have to be the root of the tree or even in the tree. It doesn't
        consider per node load, admission control, or alternative paths.</t>

        <t>As stated earlier, multiple algorithms will co-exist in the
        overlay. <list style="numbers">
            <t>Peer which initiates multicast group: <vspace blankLines="1" />
            <!--NOTE: This is intended to produce unformatted text,
   is there a less involved way to do this? --> <figure align="left"
                anchor="create">
                <artwork align="left"><![CDATA[
groupID = create();  // allocate a unique groupId 
                     // the root is the nearest
                     // peer in the overlay
                     // out of band advertisement or
                     // distribution of groupID, 
                     // perhaps by publishing in DHT
]]></artwork>
              </figure></t>

            <t>Any joining peer: <vspace blankLines="1" /> <figure
                align="left" anchor="joinTree">
                <artwork align="left"><![CDATA[
// out of band discovery of groupID, perhaps by lookup in DHT
joinTree(groupID); // sends "join groupID" message
]]></artwork>
              </figure> <vspace blankLines="1" /> The overlay routes the join
            request using the overlay routing mechanism toward the peer with
            the nearest id to the groupID. This peer is the root. Peers on the
            path to the root join the tree as forwarding points.</t>

            <t>Leave Tree: <vspace blankLines="1" /> leaveTree(groupID) //
            removes this node from the tree <vspace blankLines="1" />
            Propagates a leave message to each child node and to the parent
            node. If the parent node is a forwarding node and this is its last
            child, then it propagates a leave message to its parent. A child
            node receiving a leave message from a parent sends a join message
            to the groupID.</t>

            <t>Message forwarding: <vspace blankLines="1" />
            multicastMsg(groupID, msg);</t>

            <t>For the message forwarding there are two approaches:
               <list style="symbols">
                <t>SSM tree: The creator of the tree is the source. It sends
                data messages to the tree root which are forwarded down the
                tree.</t>

                <t>ASM tree: A node sending a data message sends the message to
                its parent and its children. Each node receiving a data
                message from one edge forwards it to remaining tree edges it
                is connected to.</t>
              </list></t>
          </list></t>
      </section>


    <section anchor="sec-protocol" title="ALM Messages Added to RELOAD Protocol"> 
     <section title="Introduction">
        <t>In this document we define messages for overlay multicast
        tree creation, using an existing proposal (RELOAD) in the P2P-SIP WG
        <xref target="I-D.ietf-p2psip-base"></xref> for a universal structured
        peer-to-peer overlay protocol. RELOAD provides the mechanism to
        support a number of overlay topologies. Hence the overlay
        multicast framework <xref target="I-D.irtf-sam-hybrid-overlay-framework"></xref> 
        (hereafter SAM framework) can be
        used with P2P-SIP, and that the SAM framework is overlay agnostic.</t>

        <t>As discussed in the SAM requirements draft, there are a variety of
        ALM tree formation and tree maintenance algorithms. The intent of this
        specification is to be algorithm agnostic, similar to how RELOAD is
        overlay algorithm agnostic. We assume that all control messages are
        propagated using overlay routed messages.</t>

      </section>

      <section title="Tree Lifecycle Messages">
        <t>Peers use the overlay to transmit ALM (application layer multicast)
        operations defined in this section.</t>

        <section title="Create Tree">
          <t>A new ALM tree is created in the overlay with the identity
          specified by GroupId. The usual interpretation of GroupId is that
          the peer with peer id closest to and less than the GroupId is the
          root of the tree. The tree has no children at the time it is
          created.</t>

          <t>The GroupId is generated from a well-known session key to be used
          by other Peers to address the multicast tree in the overlay. The
          generation of the GroupId from the SessionKey MUST be done using the
          overlay's id generation mechanism.</t>

         <t>A successful Create Tree causes an ALMTree structure to be stored in the overlay
         at the node responsible for NodeID equal to the GroupId.</t>

          <figure>
            <artwork align="left"><![CDATA[
      struct {  
        NodeID PeerId;
        opaque SessionKey<0..2^32-1>;
        NodeID GroupId;
        Dictionary Options;
      } ALMTree;
]]></artwork>
          </figure>

          <t>PeerId: the overlay address of the peer that creates the
          multicast tree.</t>

          <t>SessionKey: a well-known string when hashed using the overlay's
          id generation algorithm produces the GroupId.</t>

          <t>GroupId: the overlay address of the root of the tree</t>

          <t>Options: name-value list of properties to be associated with the
          tree, such as the maximum size of the tree, restrictions on peers
          joining the tree, latency constraints, preference for distributed or
          centralized tree formation and maintenance, heartbeat interval.</t>

         <t>Tree creation is subject to access control since it involves an Store operation.
         Before the Store of an ALMTree structure is permitted, the storing peer
         MUST check that:
        
         <list style="symbols">
           <t>The certificate contains a SessionKey</t>
           <t>The certificate contains a Node-ID that is the same as GroupID that it is
              being stored at Node-ID (this is the NODE-MATCH access policy) </t>
         </list></t>
        </section>

        <section title="Join">
          <t>Causes the distributed algorithm for peer join of a specific ALM
          group to be invoked. If successful, the PeerId is notified of one or
          more candidate parent peers in one or more JoinAccept messages. The
          particular ALM join algorithm is not specified in this protocol.</t>

          <figure>
            <artwork align="left"><![CDATA[
      struct {  
        NodeID PeerId;
        NodeID GroupId;
        Dictionary Options;
      } Join;
]]></artwork>
          </figure>

          <t>PeerId: overlay address of joining/leaving peer</t>

          <t>GroupId: the overlay address of the root of the tree</t>

          <t>Options: name-value list of options proposed by joining peer</t>
        </section>

        <section title="Join Accept">
          <t>Tells the requesting joining peer that the indicated peer is
          available to act as its parent in the ALM tree specified by GroupId,
          with the corresponding Options specified. A peer MAY receive more
          than one JoinAccept from different candidate parent peers in the
          GroupId tree. The peer accepts a peer as parent using a JoinConfirm
          message. A JoinAccept which receives neither a JoinConfirm or
          JoinDecline response MUST expire.</t>

          <figure>
            <artwork align="left"><![CDATA[
      struct {  
        NodeID ParentPeerId;
        NodeID ChildPeerId;
        NodeID GroupId;
        Dictionary Options;
      } JoinAccept;
]]></artwork>
          </figure>

          <t>ParentPeerId: overlay address of a peer which accepts the joining
          peer</t>

          <t>ChildPeerId: overlay address of joining peer</t>

          <t>GroupId: the overlay address of the root of the tree</t>

          <t>Options: name-value list of options accepted by parent peer</t>
        </section>

        <section title="Join Confirm">
          <t>A peer receiving a JoinAccept message which it wishes to accept
          MUST explicitly accept it before the expiration of the JoinAccept
          using a JoinConfirm message. The joining peer MUST include only
          those options from the JoinAccept which it also accepts, completing
          the negotiation of options between the two peers.</t>

          <figure>
            <artwork align="left"><![CDATA[
      struct {  
        NodeID ChildPeerId;
        NodeID ParentPeerId;
        NodeID GroupId;
        Dictionary Options;
      } JoinConfirm;
]]></artwork>
          </figure>

          <t>ChildPeerId: overlay address of joining peer which is a child of
          the parent peer</t>

          <t>ParentPeerId: overlay address of the peer which is the parent of
          the joining peer</t>

          <t>GroupId: the overlay address of the root of the tree</t>

          <t>Options: name-value list of options accepted by both peers</t>
        </section>

        <section title="Join Decline">
          <t>A peer receiving a JoinAccept message which does not wish to
          accept it MAY explicitly decline it using a JoinDecline message.</t>

          <figure>
            <artwork align="left"><![CDATA[
      struct {  
        NodeID PeerId;
        NodeID ParentPeerId;
        NodeID GroupId;
      } JoinDecline;
]]></artwork>
          </figure>

          <t>PeerId: overlay address of joining peer which declines the
          JoinAccept</t>

          <t>ParentPeerId: overlay address of the peer which issued a
          JoinAccept to this peer</t>

          <t>GroupId: the overlay address of the root of the tree</t>
        </section>


        <section title="Leave">
          <t>A peer which is part of an ALM tree identified by GroupId which
          intends to detach from either a child or parent peer SHOULD send a
          Leave message to the peer it wishes to detach from. A peer receiving
          a Leave message from a peer which is neither in its parent or child
          lists SHOULD ignore the message.</t>

          <figure>
            <artwork align="left"><![CDATA[
      struct {  
        NodeID PeerId;
        NodeID GroupId;
        Dictionary Options;
      } Leave;
]]></artwork>
          </figure>

          <t>PeerId: overlay address of leaving peer</t>

          <t>GroupId: the overlay address of the root of the tree</t>

          <t>Options: name-value list of options</t>
        </section>


        <section title="Re-Form or Optimize Tree">
          <t>This triggers a reorganization of either the entire tree or only
          a sub-tree. It MAY include hints to specific peers of recommended
          parent or child peers to reconnect to. A peer receiving this message
          MAY ignore it, MAY propagate it to other peers in its subtree, and
          MAY invoke local algorithms for selecting preferred parent and/or
          child peers.</t>

          <figure>
            <artwork align="left"><![CDATA[
      struct {  
        NodeID GroupId;
        NodeID PeerId;
        Dictionary Options;
      } Reform;
]]></artwork>
          </figure>

          <t>GroupId: the overlay address of the root of the tree</t>

          <t>PeerId: if omitted, then the tree is reorganized starting from
          the root, otherwise it is reorganized only at the sub-tree
          identified by PeerId.</t>

          <t>Options: name-value list of options</t>
        </section>

        <section title="Heartbeat">
          <t>A node signals to its adjacent nodes in the tree that it is
          alive. If a peer does not receive a Heartbeat message within N
          heartbeat time intervals, it MUST treat this as an explicit Leave
          message from the unresponsive peer. N is configurable.</t>

          <figure>
            <artwork align="left"><![CDATA[
      struct {  
        NodeID PeerId1;
        NodeID PeerId2;
        NodeID GroupId;
      } Heartbeat;
]]></artwork>
          </figure>

          <t>PeerId1: source of heartbeat</t>

          <t>PeerId2: destination of heartbeat</t>

          <t>GroupId: overlay address of the root of the tree</t>
        </section>
      </section>

    </section>


    <section title="Scribe Algorithm">

    <section title="Overview">
      <t>
     The following table shows a mapping between RELOAD ALM  messages (as defined in 
     Section 5 of this draft) and Scribe messages as defined in <xref target="CASTRO2002"></xref>. </t>

    
        <figure align="center" anchor="ScribeMapping">
          <artwork align="left"><![CDATA[
         +------------------+-------------------+-----------------+
         | Section in Draft |RELOAD ALM Message | Scribe Message  |
         +------------------+-------------------+-----------------+
         | 5.2.1            | CreateALMTree     | Create          |
         +------------------+-------------------+-----------------+
         | 5.2.2            | Join              | Join            |
         +------------------+-------------------+-----------------+
         | 5.2.3            | JoinAccept        |                 |
         +------------------+-------------------+-----------------+
         | 5.2.4            | JoinConfirm       |                 |
         +------------------+-------------------+-----------------+
         | 5.2.5            | JoinDecline       |                 |
         +------------------+-------------------+-----------------+
         | 5.2.8            | Leave             | Leave           |
         +------------------+-------------------+-----------------+
         | 5.2.10           | Reform            |                 |
         +------------------+-------------------+-----------------+
         | 5.2.11           | Heartbeat         |                 |
         +------------------+-------------------+-----------------+
         | new              | Push/Deliver/Send | Multicast       |
         +------------------+-------------------+-----------------+
         |                  | Note 1            | deliver         |
         +------------------+-------------------+-----------------+
         |                  | Note 1            | forward         |
         +------------------+-------------------+-----------------+
         |                  | Note 1            | route           |
         +------------------+-------------------+-----------------+
         |                  | Note 1            | send            |
         +------------------+-------------------+-----------------+
            ]]></artwork>
        </figure>
    
     <t>Note 1: These Scribe messages are handled by RELOAD messages.</t>

      <t>The following sections describe the Scribe algorithm in more detail.</t>
      
    </section>
      
        <section title="Create">

<t>
This message will create a group with GroupId. This message will be delivered 
to the node whose NodeId is closest to the GroupId. This node becomes the 
rendezvous point and root for the new multicast tree. 
Groups may have multiple sources of multicast messages.
</t>
          <figure>
            <artwork align="left"><![CDATA[
CREATE : groups.add(msg.GroupId)
]]></artwork>
          </figure>

          <t>GroupId: the overlay address of the root of the tree</t>
        </section>



        <section title="Join">

<t>
To join a multicast tree a node sends a JOIN request with the GroupId as the key. This message
gets routed by the overlay to the rendevous point of the tree. If an intermediate node is already
a forwarder for this tree, it will add the joining node as a child. Otherwise the node will create
a child table for the group and adds the joining node. It will then send the JOIN request towards the 
rendevous point terminating the JOIN message from the child.
</t>
<t>
To adapt the Scribe algorithm into the ALM Usage proposed here, after a JOIN request is accepted, a JOINAccept
message is returned to the joining node.
</t>

          <figure>
            <artwork align="left"><![CDATA[
JOIN : if(checkAccept(msg)) {
		    recvJoins.add(msg.source, msgGroupId)
		    SEND(JOINAccept(nodeID, msg.source, msg.GroupId))
		}
]]></artwork>
          </figure>

        </section>


        <section title="Leave">

<t>
When leaving a multicast group a node will change its local state to indicate that it left the group.
If the node has no children in its table it will send a LEAVE request to its parent, which will travel
up the multicast tree and will stop at a node which has still children remaining after removing 
the leaving node.
</t>

          <figure>
            <artwork align="left"><![CDATA[
LEAVE : groups[msg.GroupId].children.remove(msg.source)
           if (groups[msg.group].children = 0)
              SEND(msg,groups[msg.GroupId].parent)
]]></artwork>
          </figure>

        </section>


        <section title="JoinConfirm">

<t>
This message is not part of the Scribe protocol, but required by the basic 
protocol proposed in this draft. Thus the usage will send this message to confirm a
joining node accepting its parent node.
</t>
          <figure>
            <artwork align="left"><![CDATA[
JOINConfirm: if(recvJoins.contains(msg.source,msg.GroupId)){
		 if !(groups.contains(msg.GroupId)) {
            	   groups.add(msg.GroupId)
            	   SEND(msg,msg.GroupId)
		 }
                groups[msg.GroupId].children.add(msg.source)
			 recvJoins.del(msg.source, msgGroupId)
	      }
]]></artwork>
          </figure>

        </section>

        <section title="JoinDecline">

          <figure>
            <artwork align="left"><![CDATA[
JOINDecline: if(recvJoins.contains(msg.source,msg.GroupId))
			 recvJoins.del(msg.source, msgGroupId)
]]></artwork>
          </figure>

        </section>

        <section title="Multicast">

<t>
A message to be multicast to a group is sent to the rendevous node from where it is 
forwarded down the tree. If a node is a member of the tree rather than just a forwarder
it will pass the multicast data up to the application.
</t>

          <figure>
            <artwork align="left"><![CDATA[
MULTICAST : foreach(groups[msg.GroupId].children as NodeId)
                   SEND(msg,NodeId)
            if memberOf(msg.GroupId)
                   invokeMessageHandler(msg.GroupId, msg)
]]></artwork>
          </figure>

        </section>


   </section>


    <section title="P2PCast Algorithm">

    <section title="Overview">

<t>
P2PCast <xref target="P2PCAST"></xref>creates a forest of related trees to increase load balancing. 
P2PCast is independent on the underlying P2P substrate. Its goals and approach 
are similar to Splitstream <xref target="SPLITSTREAM"></xref>(which assumes Pastry as the P2P overlay).
In P2PCast the content provider splits the stream of data into f stripes. 
Each tree in the forest of multicast trees is an (almost) full tree of arity
f. These trees are conceptually separate: every node of the system appears 
once in each tree, with the content provider being the source in all
of them. To ensure that each peer contributes as much bandwidth as it 
receives, every node is a leaf in all the trees except for one, in which the
node will serve as an internal node (proper tree of this node). The remainder of this
section will assume f=2 for the discussion. This is to keep the complexity 
for the description down. However, the algorithm scales for any number f.
</t>

<t>P2PCast distinguishes the following types of nodes:</t>

<t><list style="symbols">
<t>Incomplete Nodes: A node with less than f children in its proper stripe;</t>
<t>Only-Child Nodes: A node whose parent (in any multicast tree) is an incomplete
node; </t>
<t>Complete Nodes: A node with exactly f children in its proper stripe </t>
<t>Special Node: A single node which is a leaf in all multicast trees of the forest </t>
</list>
</t>
</section>

<section title="Create">

<t>
This message will create a group with group_id. This message will be delivered 
to the node whose node_id is closest to the group_id. This node becomes the 
rendezvous point and root for the new multicast tree. The rendezvous point will maintain f subtrees.
</t>

</section>

<section title="Join">
<t>
To join a multicast tree a joining node N sends a JOIN request to a random node A already part of the tree.
Depending of the type of A the joining algorithm continues as follows:</t>

<t><list style="symbols">
<t>Incomplete Nodes: A will arbitrarily select for which tree it wants to serve as an internal node, 
and adopt N in that tree. In the other tree N will adopt A as a child (taking A's place in the tree) 
thus becoming an internal node in the stripe that A didn't choose.</t>

<t>Only-Child Nodes: As this node has a parent which is an incomplete node, the joining node will be 
redirected to the parent node and will handle the request as detailed above.</t>

<t>Complete Nodes: The contacted node A must be a leaf in the other tree. If A is a leaf node in Stripe 1,
N will become an internal node in Stripe 1, taking the place of A, adopting it at the
same time. To find a place for itself in the other stripe, N starts a random walk
down the subtree rooted at the sibling of A (if A is the root and thus does not have sublings, 
N is sent directly to a leaf in that tree), which ends as soon as N finds an incomplete
node or a leaf. In this case N is adopted by the incomplete node. </t>

<t>Special Node: as this node is a leaf in all subtrees, the joining node can adapt the node 
in one tree and become a child in the other.</t>
</list>
</t>
<t>
P2PCast uses defined messages for communication between nodes during reorganisation. Here these messages 
are encapsulated by the message type REFORM is used. The P2PCast message is included in the Options 
parameter of REFORM. The following messages are defined by P2PCast:
</t>

<t><list>
<t>TAKEON: To take another peer as a child</t>
<t>SUBSTITUTE: To take the place of a child of some peer</t>
<t>SEARCH: To obtain the child of a node in a particular stripe</t>
<t>REPLACE: Different from SUBSTITUTE in that the node which makes us its child sheds off a random child</t>
<t>DIRECT: To direct a node to its wouldbe parent</t>
<t>UPDATE: A node sends its updated state to its children</t>
</list></t>

<t>To adapt the P2PCast algorithm into the ALM Usage proposed here, after a JOIN request is accepted, a JOINAccept
message is returned to the joining node (one for every subtree).</t>

</section>

<section title="Leave">

<t>
When leaving a multicast group a node will change its local state to indicate that it left the group.
Distregarding the case where the leaving node is the root of the tree, the leaving node must be 
complete or incomplete in its proper tree. In the other trees the node is a leaf and can just 
disappear by notifying its parent.

For the proper tree, if the node is incomplete, it is replaced by its child. However, if the node is 
complete, a bubble is created which is filled by a random child. If this child is incomplete, it can 
simply fill the gap. However, if it is complete, it needs to shed a random child. This child is directed to 
its sibling, which sheds a random child. This process ripples down the tree until the next-to-last level 
is reached. The shed node is then taken as a child by the parent of the deleted node in the other stripe.
</t>

<t>Again, for the reorganisation of the tree, the REFORM message type is used as defined in the previous section.</t>

</section>


<section title="JoinConfirm">

<t>
This message is not part of the P2PCast protocol, but required by the basic 
protocol proposed in this draft. Thus the usage will send this message to confirm a
joining node accepting its parent node. As with Join and JoinAccept, this will be 
carried out for every subtree.
</t>
        </section>

<section title="JoinDecline">

          <figure>
            <artwork align="left"><![CDATA[
JOINDecline: if(recvJoins.contains(msg.source,msg.group_id))
			 recvJoins.del(msg.source, msggroup_id)
]]></artwork>
          </figure>

        </section>

        <section title="Multicast">

<t>
A message to be multicast to a group is sent to the rendezvous node from where it is 
forwarded down the tree by being split into k stripes. Each stripe is then sent via a subtree. 
If a receiving node is a member of the tree rather than just a forwarder
it will pass the multicast data up to the application.
</t>

</section>



</section>



    <section title="Examples">
      <t>All peers in the examples are assumed to have completed bootstrapping.  "Pn" refers to peer N.  
        "GroupID" refers to a peer responsible for storing the ALMTree instance with GroupID.
      </t>
      <section title="Create Tree">

        <figure align="center" anchor="CreateTreeExample">
          <artwork align="left"><![CDATA[
     P1      P2      P3       P4      GroupID
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     | CreateTree    |        |       |
     |------------------------------->|
     |       |       |        |       |
     |       |       |        |       |
     |       |    CreateTreeResponse  |
     |<-------------------------------|
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
            ]]></artwork>
        </figure>
      </section>

      <section title="Join Tree">
        <figure align="center" anchor="JoinTreeExample">
          <artwork align="left"><![CDATA[
     P1      P2      P3       P4      GroupID
     |       |       |        |       |
     |       |       |        |       |
     | Join                           |
     |------------------------------->|
     |       |       |        |       |
     | JoinAccept                     |
     |<-------------------------------|
     |       |       |        |       |
     |       |       |        |       |
     |       |Join                    |
     |       |----------------------->|
     |       |       |        |       |
     |                            Join|
     |<-------------------------------|
     |       |       |        |       |
     |JoinAccept     |        |       |
     |------>|       |        |       |
     |       |       |        |       |
     |JoinConfirm    |        |       |
     |<------|       |        |       |
     |       |       |        |       |
     |       |       |        |Join   |
     |       |       |        |------>|
     |       |       |        |  Join |
     |<-------------------------------|
     |       |       |        |       |
     | Join  |       |        |       |
     |------>|       |        |       |
     |       |       |        |       |
     | JoinAccept    |        |       |
     |----------------------->|       |
     |       |       |        |       |
     |       | JoinAccept     |       |
     |       |--------------->|       |
     |       |       |        |       |
     |       |       |        |       |
     |       |   Join Confirm |       |
     |<-----------------------|       |
     |       |       |        |       |
     |       |   Join Decline |       |
     |       |<---------------|       |
     |       |       |        |       |
     |       |       |        |       |
            ]]></artwork>
        </figure>
      </section>

      <section title="Leave Tree">
        <figure align="center" anchor="LeaveTreeExample">
          <artwork align="left"><![CDATA[
     P1      P2      P3       P4      GroupID
     |       |       |        |       |
     |       |       |        |       |
     |       |       |  Leave |       |
     |<-----------------------|       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
     |       |       |        |       |
            ]]></artwork>
        </figure>
      </section>

      <section title="Add Direct Application Edge">
      </section>

      <section title="Adjust Tree to Churn">
      </section>

      <section title="Push Data">
      </section>

    </section>

    <section title="Kind Definitions">
       <section title="ALMTree Kind Definition">
        <t>This section defines the ALMTree kind.</t>
        <t>Kind IDs  The Resource Name for the ALMTree Kind-ID is the SessionKey used to identify the ALM tree</t>
        <t>Data Model The data model is the ALMTree structure. </t>
        <t>Access Control  NODE-MATCH </t>
      </section>
    </section>

    <section title="Configuration File Extensions">
      <t>In RELOAD, peers receive a configuration document at bootstrap time.
      ALM parameter definitions for the configuration file will be defined in a later version.
      </t>
    </section>

    <section title="Change History">
        <t><list style="symbols">
            <t>Version 02: Remove Hybrid ALM material.  Define ALMTree kind.  Define new RELOAD messages.  Define RELOAD architecture extensions. Add Scribe as base algorithm for ALM usage. Define code points. Define preliminary ALM-specific security issues.</t>
           <t>Version 03: Add P2Pcast Algorithm.</t>
          </list></t>
    </section>

   <section title="Open Issues">
      <t> <list style="symbols">
         <t>The specific capabilities of clients in terms of tree creation and being parents of
        other nodes will be described in subsequent versions.</t>
         <t>ALM parameter definitions for the RELOAD configuration file will be defined in a later version. </t>
         <t>Should any other ALM algorithms be mapped</t>
         <t> </t>
        </list>
      </t>
    </section>
    <!-- Possibly a 'Contributors' section ... -->

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t> 
      <t>Message codes</t>

        <figure align="center" anchor="MessageCodes">
          <artwork align="left"><![CDATA[
      +-------------------------+------------------+------------------+
      | Message                 |RELOAD Code Point | ALM Message Code |
      +-------------------------+------------------+------------------+
      | CreateALMTRee           | 35               | 00               |
      +-------------------------+------------------+------------------+
      | CreateALMTreeResponse   | 36               | 01               |
      +-------------------------+------------------+------------------+
      | Join                    | 36               | 02               |
      +-------------------------+------------------+------------------+
      | JoinAccept              | 36               | 03               |
      +-------------------------+------------------+------------------+
      | JoinReject              | 36               | 04               |
      +-------------------------+------------------+------------------+
      | JoinConfirm             | 36               | 05               |
      +-------------------------+------------------+------------------+
      | JoinDecline             | 36               | 06               |
      +-------------------------+------------------+------------------+
      | Leave                   | 36               | 07               |
      +-------------------------+------------------+------------------+
      | LeaveResponse           | 36               | 08               |
      +-------------------------+------------------+------------------+
      | Reform                  | 36               | 09               |
      +-------------------------+------------------+------------------+
      | ReformResponse          | 36               | x0A              |
      +-------------------------+------------------+------------------+
      | Heartbeat               | 36               | x0B              |
      +-------------------------+------------------+------------------+
      | Push                    | 36               | x0C              |
      +-------------------------+------------------+------------------+
      | PushResponse            | 36               | x0D              |
      +-------------------------+------------------+------------------+
            ]]></artwork>
        </figure>



      <t>Code points for the kinds defined in this document MUST not conflict with any defined code points for RELOAD. RELOAD defines exp_a_req, exp_a_ans for experimental purposes.  This specification uses only these message types for all ALM messages, with a sub-type to distinguish the specific ALM message
 For Data Kind-IDs, the RELOAD specification states: "Code points in the range 0xf0000001 to 0xfffffffe are reserved for private use".  ALM Usage Kind-IDs will be defined in the private use range.</t>
     
    <t>
     All ALM Usage messages support the RELOAD Message Extension mechanism.
    </t>
    <t>No new Error Codes are defined. RELOAD defines Error_Exp_A and Error_Exp_B.  This will be used if new error codes are needed.</t>
    <t>Application-ID: The ALM Usage Application-IDs must not conflict with other applications of
reload.  Additionally if AppAttach is used, the port number must be selected to avoid conflicts. </t>
    <t>Access Control Policies: No new policies.</t>
    <t>ALM Algorithm Types:  There are currently two types: SCRIBE-RELOAD, P2PCAST-RELOAD.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>Overlays are vulnerable to DOS and collusion attacks. We are not
      solving overlay security issues. We assume the node authentication model as defined in <xref
      target="I-D.ietf-p2psip-base"></xref>.</t>
      <t>ALM Usage specific security issues: <list style="symbols">
          <t>Right to create GroupID at some NodeId </t>
          <t>Right to store Tree info at some Location in the DHT </t>
          <t>Limit on # messages / sec and bandwidth use </t>
          <t>Right to join an ALM tree </t>
          <t> </t>
         </list></t>
    </section>
    <section title="Acknowledgement">
        <t>Marc Petit-Huguenin provided important comments on earlier version of this draft.</t>
   </section>
  </middle>

  <back>
    <references title="Normative References">
      <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->

      <?rfc include="reference.RFC.2119"?>
      <?rfc include="reference.RFC.0792"?>
      <?rfc include="reference.RFC.3376"?>
      <?rfc include="reference.RFC.3810"?>
      <?rfc include="reference.RFC.4605"?>
      <?rfc include="reference.RFC.4607"?>
      <?rfc include="reference.RFC.5058"?>

    </references>

    <references title="Informative References">
      <!-- Here we use entities that we defined at the beginning. -->

      <?rfc include="reference.RFC.1930"?>
      <?rfc include="reference.RFC.3552"?>
      <?rfc include="reference.RFC.4286"?>
      <?rfc include="reference.RFC.1112"?>

      <?rfc include="reference.I-D.ietf-mboned-auto-multicast"?>
      <?rfc include="reference.I-D.ietf-p2psip-base"?>
      <?rfc include="reference.I-D.ietf-p2psip-sip"?>
      <?rfc include="reference.I-D.matuszewski-p2psip-security-overview"?>
      <?rfc include="reference.I-D.irtf-p2prg-rtc-security"?>
      <?rfc include="reference.I-D.irtf-samrg-common-api"?>
      <?rfc include="reference.I-D.irtf-sam-hybrid-overlay-framework"?>


      <reference anchor="AGU1984" target="http://dl.acm.org/citation.cfm?id=802060">
        <front>
          <title>Datagram Routing for Internet Multicasting</title>
          <author initials="L." surname="Aguilar"></author>
          <date month="March" year="1984" />
        </front>

        <seriesInfo name="ACM Sigcomm 84" value="1984" />
      </reference>

      <reference anchor="CASTRO2002"
                 target="http://research.microsoft.com/en-us/um/people/antr/past/jsac.pdf">
        <front>
          <title>Scribe: A large-scale and decentralized application-level multicast infrastructure</title>
          <author initials="M." surname="Castro"></author>
          <author initials="P." surname="Druschel"></author>
          <author initials="A.-M." surname="Kermarrec"></author>
          <author initials="A." surname="Rowstron"></author>
          <date month="October" year="2002" />
        </front>

        <seriesInfo name="IEEE Journal on Selected Areas in Communications"
                    value="vol.20, No.8" />
      </reference>

      <reference anchor="CASTRO2003"
                 target="http://research.microsoft.com/en-us/um/people/mcastro/publications/infocom-compare.pdf">
        <front>
          <title>An Evaluation of Scalable Application-level Multicast Built Using Peer-to-peer overlays</title>
          <author initials="M." surname="Castro"></author>
          <author initials="M." surname="Jones"></author>
          <author initials="A.-M." surname="Kermarrec"></author>
          <author initials="A." surname="Rowstron"></author>
          <author initials="M." surname="Theimer"></author>
          <author initials="H." surname="Wang"></author>
          <author initials="A." surname="Wolman"></author>
          <date month="April" year="2003" />
        </front>

        <seriesInfo name="Proceedings of IEEE INFOCOM" value="2003" />
      </reference>

      <reference anchor="HE2005" target="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1284204&abstractAccess=no&userType=inst">
        <front>
          <title>Dynamic Host-Group/Multi-Destination Routing for Multicast Sessions</title>
          <author initials="Q." surname="He" />
          <author initials="M." surname="Ammar" />
          <date day="" month="" year="2005"/>
        </front>

        <seriesInfo name="J. Telecommunication Systems" value="vol. 28, pp. 409-433" />
      </reference>

      <reference anchor="SPLITSTREAM"
                 target="http://research.microsoft.com/en-us/um/people/antr/PAST/SplitStream-sosp.pdf">
        <front>
          <title>SplitStream: High-bandwidth multicast in a cooperative environment</title>
          <author initials="M." surname="Castro"></author>
          <author initials="P." surname="Druschel"></author>
          <author initials="A." surname="Nandi"></author>
          <author initials="A.-M." surname="Kermarrec"></author>
          <author initials="A." surname="Rowstron"></author>
          <author initials="A." surname="Singh"></author>
          <date month="October" year="2003" />
        </front>

        <seriesInfo name="SOSP'03,Lake Bolton, New York" value="2003"/>
      </reference>

      <reference anchor="P2PCAST" target="http://www.scs.stanford.edu/~reddy/research/p2pcast/report.pdf">
        <front>
          <title>P2PCast: A Peer-to-Peer Multicast Scheme for Streaming Data</title>

          <author initials="A." surname="Nicolosi" />
          <author initials="S." surname="Annapureddy" />
          <date month="May" year="2003" />
        </front>

        <seriesInfo name="Stanford Secure Computer Systems Group Report" value="2003" />
      </reference>
    </references>

    <section anchor="app-additional" title="Additional Stuff">
      <t>This becomes an Appendix.</t>
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 15:50:49