One document matched: draft-ietf-rmt-bb-tree-config-03.txt
Differences from draft-ietf-rmt-bb-tree-config-02.txt
RMT Working Group Brian Whetten/Consultant
Internet Engineering Task Force Dah Ming Chiu/Sun Microsystems
Internet Draft Miriam Kadansky/Sun Microsystems
draft-ietf-rmt-bb-tree-config-03.txt Seok Joo Koh/ETRI/KOREA
18 November, 2002 Gursel Taskale/Tibco
Expires 18 May, 2003 Brian Neil Levine/Umass
Reliable Multicast Transport Building Block: Tree Auto-Configuration
<draft-ietf-rmt-bb-tree-config-03.txt>
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time. It
is inappropriate to use Internet-Drafts as reference material or to
cite them other than as a "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
The Reliable Multicast Transport Working Group has been chartered
to standardize multicast transport services. This working group
expects to initially standardize three protocol instantiations.
This draft is concerned with the requirements of the tree-based
ACK protocol. In particular, it is concerned with defining the
building block for auto-configuration of the logical ACK-tree.
According to the charter, a building block is "a coarse-grained
modular component that is common to multiple protocols along with
abstract APIs that define a building block's access methods and
their arguments." For more information, see the Reliable
Multicast Transport Building Blocks and Reliable Multicast Design
Space documents [WLKHFL01][HWKFV00].
Table of Contents
1. Introduction
1.1 Terminology
1.2 Assumptions
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt 1
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
1.3 Requirements
1.4 Applicability Statement
2. Overview
3. Session Announcement
4. Service Node Discovery and Selection
4.1 Service Node Discovery Algorithms
4.1.1 Static
4.1.2 ERS
4.1.3 Mesh
4.2 Distance Metrics
4.2.1 TTL Hop-Count
4.2.2 Number of Levels
4.2.3 Delay
4.2.4 Address
4.2.5 Static
4.2.6 GRA
4.3 Service Node Selection
5. Tree Formation
6. Tree Maintenance
6.1 Monitoring Parents and Children
6.2 Optimizing the Tree
7. Messages
8. External Interfaces
8.1 Interfaces to the BB
8.1.1 start
8.1.2 end
8.1.3 incomingMessage
8.1.4 getStatistics
8.1.5 getSNs
8.1.6 setSN
8.1.7 acceptChild
8.1.8 removeChild
8.1.9 treeLevelUpdate
8.1.10 lostSN
8.1.11 setOptimization
8.1.12 recordSNs
8.2 Interfaces from the BB
8.2.1 outgoingMessage
8.2.2 SNlist
9. References
Authors' Addresses
1. Introduction
The Reliable Multicast Transport (RMT) working group has been
chartered to standardize IP multicast transport services. This
draft is concerned with the requirements of the tree-based ACK
protocol [WCPKT00]. In particular, this draft defines a building
block for auto-configuration of a tree comprised of a single
Sender, Service Nodes, and Receivers into a tree (called a Session
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 2
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
Tree in this document). The design goals of this draft are
motivated by the needs of TRACK-based protocols, however the trees
it constructs are useful for other services.
This building block can be interfaced to any other BB or PI
wishing to use a tree structure. To actually use this BB's
features, the PI needs to includes the messages described in this
BB in its packets. An example of how to use this BB can be found
in the TRACK PI[WCPKT01].
The process of session tree construction is difficult for IP
multicast. The best session trees match the underlying multicast
routing tree topology [LPG98], however the multicast service model
[DEE89] does not provide explicit support for discovering routing
tree topology.
Furthermore, deployed multicast architectures can vary; for
example, hosts may be restricted to multicast reception and not
transmission with Source-Specific multicast routing [HC00]; and
routers may provide special extended routing services with Generic
Router Assist [CST00]. The RMT charter does not restrict the use
of any particular network service in constructing the tree. It
only suggests preferred scenarios. Accordingly, there are several
viable solutions for constructing a tree, depending on network
conditions.
The optimality of a tree may also depend on other factors, such as
the need for load balancing, and the need to minimize the depth
when used for collecting feedback information. The goal of this
building block is to specify a distributed procedure for
automatically constructing a tree that is loop-free and as
efficient as possible given the information available.
This draft describes a unified solution for tree construction in
the presence of different multicast service models and routing
protocols. In particular, it specifies a single procedure which
may be used with various techniques for service node discovery and
distance measurements, several of which are specified within this
document. The difference in these techniques primarily affects the
optimality of the tree. The unified algorithm ensures that
different implementations can interoperate and construct a loop-
free tree.
In order to accommodate various multicast deployments, this
document divides the tree building process into the following
major components:
1. Several techniques for discovering neighboring service nodes.
In particular: Static, Expanding Ring Search, and Mesh.
Discovering neighboring service nodes is a necessary
condition for getting connected, so each node in the session
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 3
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
must implement at least one of the above techniques.
2. Several algorithms for selecting neighboring service nodes.
In particular, the measurement and use of neighbor distance
and sender distance are described. These service node
selection algorithms help produce a good tree.
3. A single distributed procedure for construction and
maintenance of loop-free Session Trees.
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC 2119.
Session
A session is used to distribute data over a multicast address. A
Session Tree is used to provide reliability and feedback services
for a session.
Sender
The single sender of data on a multicast session. The Sender is the
root of the Session Tree.
Receiver
A receiver receives data from the sender via the multicast session.
Session Identifier
A fixed-size number, chosen either by the application that creates
the session or by the transport. Senders and Receivers use the
Session Identifier to distinguish sessions. The size of this
number is specified by the Protocol Instantiation (PI).
Service Node (SN)
A node within the tree which receives and retransmits data, and
aggregates and forwards control information toward the Sender. The
Sender operates as the root Service Node in any session tree.
Service Nodes MAY be dedicated servers within a network designed to
participate in multiple Sessions and support multiple trees, or
they MAY be Receivers participating in an individual session. SNs
MAY limit the number of Children they choose to service, and MAY
also make other restrictions on the characteristics of each Child
(distance, location, etc.). An SN that has accepted Children for a
session is called a Parent.
In other documents, Service Node is sometimes referred to as Repair
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 4
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
Head (RH).
Session Tree (ST)
The Session Tree is a tree spanning all receivers of a multicast
session. It is rooted at the Sender, consisting of zero of more
Service Nodes as interior nodes, and zero or more receivers as leaf
nodes. An ST is constructed for the forwarding of control
information back to the Sender as well as for the resending of
missed data to the Receivers. The ST for a particular session may
change over the course of the session.
Parent
A Parent is an SN or Receiver's predecessor in the ST on the path
toward the Sender. Every SN or Receiver on the tree except the
Sender itself has a parent. Each Parent communicates with its
children using either an assigned multicast address or through
unicast. If a multicast address is used, this may be the same
address used by the session, or one specifically assigned to the
Parent.
Children
The set of Receivers and SNs for which an SN or the Sender is
providing repair and feedback services.
Tree Level
A number indicating the number of "generations" a node is from the
root. The sender is at TL=0. Those that use the sender as their
parent are at TL=1 and so on. When a receiver is not connected to
the tree yet, it has a tree level value greater or equal to 128.
The reason for reserving part of the space (of tree levels) for
indicating "off-tree" is so that special measures can be used to
prevent forming loops. The largest value is 255, so the range of
off-tree levels are in the range 128 - 255. Initially, all
receivers have a TL value of 128. Once a Node joins the tree, its
Tree Level is updated to be one more than its Parent's level.
Distance Metric
There are several techniques to quantify distances between nodes
(Receivers, SNs, and the Sender) in a session. Each type of
quantification is called a distance metric. Several Distance
Metrics are described in this draft.
Sender Distance
The distance from a node to the Sender.
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 5
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
Neighbor Distance
The distance from a node to a Neighbor.
Neighbor
A node's (Receiver or SN) Neighbors are SNs that are close to the
node, according to the Distance Metric(s) used by the node.
1.2 Assumptions
This document describes how to build trees under the following
conditions:
a. The multicast group has only a single sender.
b. A single SN can serve multiple sessions.
c. Sessions can take advantage of a pre-deployed infrastructure
of SNs (ones that are not necessarily aware of a session
before the receivers), or recruit Receivers to be SNs.
d. Generic Router Assist[CST00] and Expanding Ring Search[YGS95]
are not required of the network infrastructure, but if
available they should be able to be utilized.
1.3 Requirements
The following are specifically required:
a. While tree-building may take advantage of information from the
routing layer, the mechanisms described are independent of the
routing protocol(s) used by the underlying multicast tree.
b. All trees constructed must be loop-free
c. These mechanisms must support late joiners and tree
optimization
1.4 Applicability Statement
The authors recognize that automatic tree construction is a very
difficult problem. Nonetheless, complete reliance on manual
configuration is very user unfriendly and error prone as well.
This building block describes a procedure for constructing loop-
free trees when there is minimal manual configured information
available.
This is analogous to providing a system with default configurations
that allow the system to work correctly, but not necessarily
optimally.
There are many possible criteria for tree optimality. This BB does
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 6
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
not attempt to define a single optimality criterion, nor does it
try to produce an optimal tree. It is, however, the goal of the
BB to construct better trees as more configuration and measurement
data are introduced to the procedure.
This BB describes only a subset of the possible parameters for
constructing optimal trees, in particular sender distance and
neighbor distance. There are many techniques for measuring these
distances. Some of the techniques may not be applicable globally.
Expanding ring search (ERS) is an effective technique in a local
subnet or intranet (especially when the IP multicast routing
protocol is dense-mode based). On the other hand, it is not
practical in a multi-domain network; it is not effective when the
routing protocol is sparse-mode based; and it can add significant
control traffic overhead.
Generic Router Assist (GRA) can provide measurement hooks to
determine SNs that are located along the path for multicast data
distribution. However, such facilities may not be available in all
networks.
The tree construction procedure does allow manual configuration and
various distance measurement techniques to be selectively and
independently applied for different subgroups of receivers and SNs,
to achieve incremental improvement to the quality of the tree.
There are many other criteria for tree-building than what is
described in this document, for instance, methods based on load
balancing and minimizing feedback latency.
2. Overview
The tree building process described within this document builds
logical trees which consist of:
1. A root node (the Sender)
2. Intermediate nodes (Service Nodes or SNs) which may be
either Receivers or nodes specifically allocated to the
task of repair and aggregation
3. Leaf nodes which are Receivers only
Session trees are spanning trees rooted at the Sender. Session
trees can be used for the forwarding of control information (i.e.
ACKs) towards the root, or for forwarding of data (i.e. repairs)
towards the leaf nodes.
Session trees are constructed per Sender; each node wishing to join
the tree discovers its neighboring SNs and then selects its best
parent based on locally available information, such as the relative
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 7
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
sender distances and neighbor distances. This document specifies
several techniques for measuring distances.
It is important to note that SNs may be actual Receivers (e.g.
Receivers willing and able to also function as SNs) or pre-deployed
"specialized" servers that are signaled to join the tree by
Receivers. We use the term Service Node to refer to either a
Receiver or "server" which is participating as part of the logical
tree formation.
Tree construction, regardless of SN discovery and selection
algorithm, proceeds generically as follows.
1. Session Announcement
Receivers of a session use standard out-of-band mechanisms
for discovery of a session's existence (e.g. Session
Advertisement ([HPW00], URL, etc). In this way, a Receiver
discovers the multicast group address, the Sender's address,
and other information necessary for logical tree construction.
Sessions may be announced in two parts, the first part
containing generic information about the session, such as the
multicast address, and the second part, announced on the
multicast address, containing additional information.
2. Measurements to the Sender (optional)
All SNs and Receivers that know about the session optionally
determine their distance to the Sender.
3. Neighbor Discovery
Meanwhile, each Receiver discovers nearby SNs (candidate
parents) for the Session using the neighbor discovery
algorithm(s).
4. Service Node Selection
Once a Receiver (or SN needing to join the tree) discovers a
nearby SN, it obtains the SN's distance to the Sender as well
as the SN's distance to the Receiver, tree level, and other
suitability values, if available. After discovery is complete,
the best SN is selected.
5. Binding to Service Node
The Receiver or SN then binds to the chosen SN. If a bind is
unsuccessful, the Receiver or SN retries with another nearby
SN, or starts the discovery process all over again. Once an
SN receives a bind from a child, that SN must then also join
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 8
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
the tree if it has not already, discovering an SN of its own,
possibly using a different method than leaf Receivers.
6. Optimization (optional) and Fault Recovery
During a session, a Receiver or SN may change to a different
SN for a number of reasons described below, including fault
tolerance. The Session Tree is maintained and optimized over
time.
This building block provides mechanisms for maintaining and
optimizing the tree, as well as tearing it down when the Sender is
done with it. In the rest of this document, the term 'Node'
denotes a Receiver or SN.
+--------------------+
| 1. Session |
| Advertisement |
+--------------------+
|
| Node receives tree-building parameters
V
+--------------------+
| 2. Measurements |<-------------------------|
| to the Sender | |
| (optional) | |
+--------------------+ |
| |
| |
V |
+--------------------+ |
| 3. Neighbor | |
| Discovery | |
+--------------------+ |
| |
| |
V |
+--------------------+ |
| 4. Service Node | |
| Selection | |
+--------------------+ |
| |
| Node picks best Neighbor |
V |
+--------------------+ |
| 5. Binding to |---------------------------
| Service Node | 6. Optimization (optional)
| | and Fault Recovery
+--------------------+
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 9
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
3. Session Announcement
The first step in the tree-building process is for a node to be
informed of the session's existence. This can be done using some
out-of-band method, such as Session Advertisement [HPW00], URL, e-
mail, etc.
SNs do not necessarily receive these advertisements. If an SN is
not a Receiver, it obtains the advertisement information once it
is contacted by a Receiver.
The advertisement includes the multicast address being used, the
Sender's address, the Session Identifier, any specific port numbers
to be used, and any global information useful for tree construction.
The advertisement may also contain information about one or more
Service Node Discovery options (such as Static, ERS, and Mesh)
that can possibly be used by Receivers in the session.
4. Service Node Discovery and Selection
Discovery is the process by which a node determines a suitable
Service Node. During the discovery process, suitable neighbors are
found, Sender distances are optionally exchanged, and the best SN
is selected.
4.1 Service Node Discovery Algorithms
This draft describes three algorithms for discovering SNs: Static,
Expanding Ring Search (ERS), and Mesh.
Multiple algorithms may be used within a single session. For
example, SNs may use the Mesh algorithm, while the receivers use
static configuration to discover the SNs; alternatively, some
Receivers may use static configuration while other Receivers depend
on ERS (in an intranet where ERS is available). Each Receiver may
pre-configure which algorithm to use before it starts.
The transport protocols request the following information from this
BB using the getSNs interface.
Service Nodes:
1. ParentAddress: the address and port of the parent node to
which the node should connect
2. UDPListenPort: the number of the port on which the node will
listen for its children's control messages
3. RepairAddr: the multicast address, UDP port, and TTL on
which this node sends control messages to its
children.
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 10
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
Receivers:
1. ParentAddress.
Senders:
1. UDPListenPort
2. RepairAddr
After the above information is obtained from auto-tree-config, the
transport protocol may perform the necessary Bind operations for
participating in the Session Tree.
4.1.1 Static
Static algotirhm relies on a functional entity, named Tree
Configurator (TC), which is pre-configured by Sender for tree
configuration in a static manner. TC may be simply implemented by
a program and thus installed at Sender or any other host. It is
not necessarily specialized infrastructure.
Static scheme is a typical top-down tree configuration approach. TC
is used to govern the tree building based on its own (session-
specific) tree configuration and SN(s) selection rules for the new
joiners.
If a TC is used for tree building, its address and port MUST be
included with the session advertisement. Receivers and SNs will
realize there is a TC for the session via Session Announcement,
and they can contact with the TC to get a list of candidate SNs by
sending a unicast Query message.
In response to a Query message, the TC replies with a Advertise
message that contains a list of candidate SNs available to the new
joiner. The rule of determining such candidate SNs may depend on
the pre-configured mechanism taken by TC. For example, TC may
determine the candidate SN list for a Node among the possible SNs
(it contains at that time) by considering which SNs are in the
same network domain with the Node (i.e., via comparing their
network prefix), or by considering the load balancing for the tree
topology it has configured until then. For this purpose, TC
maintains a pool of active SNs for the session. The list of
candidate SNs carried by Advertise message is ordered in
decreasing levels of preference, in which a lower number
represents a higher preference.
When a Receiver Node receives the responding Advertise message from
TC, the Node MAY proceed to try to bind to a candidate SN
following the given order by sending a BindRequestmessage, and
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 11
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
then waits for the responding message such as BindConfirm or
BindReject from TC. These binding steps will be done according to
the TRACK mechanism, as described in the TRACK PI [WCMKT02]
In the Static algorithm, SN discovery with a TC proceeds as
follows:
1. The node joins the multicast session and learns of the
location of TC (from the session announcement).
2. The node optionally discovers its distance from the Sender.
Any metric described in Section 4.2 may be used.
3. The node sends a Query message to the TC for a parent. The
request includes (optionally) the node's distance to the
sender and whether the node functions as an SN or not.
4. The TC chooses one or more candidate parents (SNs) for the
node from the active SNs by its own tree configuration rule.
The selection of candidate parents may be done by comparing
the network prefix or by referring to any other information
such as the number of currently attached children, etc.
5. The TC MUST responds to the Query message with an Advertise
message, which include the candidate parent list. In the
list, each entry contains the corresponding IP address and
port of an SN.
All the entries in the list SHOULD be arranged in the
decreasing order of preference levels.
In the rejection case, the Advertise message does not
include any candidate parent. In this case, the Node may
resort to the other mechanism such as ERS and Mesh.
In the success case, the node will be enrolled as an active
SN by the TC, if it functions as an SN in the session. Each
active SN (functioning as a parent for the Session Tree)
SHOULD send the TC the periodic Query messages with a flag
indicating that it is active over a specific time interval.
Based on the Query messages, the TC updates a pool of active
SNs in the session. In response to the Query message, the TC
sends a Advertise with a flag simply indicating that the
Query message is received.
6. After receiving a successful Advertise message from the TC,
the node will try to connect to its parent by sending
BindRequest messages based on the candidate parent list, as
described in Section 5.
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 12
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
4.1.2 ERS
ERS is a typical bottom-up tree configuration approach, which can
be used only in the network environments where IP multicast
transmissions are allowed by Receivers and SNs.
In ERS algorithm, SN discovery with a TC proceeds as follows:
1. The Nodes first look for Neighbors using a multicast Query
message. The initial TTL value in the Query message,
TTLNeighborInit, is specified in the session announcement
and may be as large as the session TTL (TTLSession).
An SN that is able to handle additional Children MUST
respond to a Query message by multicasting an Advertise
message.
If the SN is not yet a Parent, the TTL used in this response
is the same TTL used in the Query message. If the SN is a
Parent, the TTL used is the greater of the Query TTL and the
Parent's current Advertise TTL.
2. The Node listens for Advertise messages after sending the
Query message. If one or more Advertise messages are
received during a SolicitPeriod, the best SN among them is
selected as described in section 4.3.
3. If no Advertise messages are received, the Node sends
another multicast Query message with a TTL that is
incremented by TTLIncrement. The process of sending the
multicast Query message with an increasing TTL value
continues until a response is received.
The TTL value is limited by a value, TTLMax, also specified
in the session announcement. TTLMax defaults to the value
of TTLSession.
If the TTL value required to reach the soliciting Node is
greater than the TTL used to reach the SN, an Advertise
message may not reach the Node. However, if future Query
messages have increased TTL values, the TTL may eventually
be large enough for the Advertise message to reach the Node.
However, it is possible that the Node will not locate any
SNs using Expanding Ring Search. It is advisable that a
backup method, such as static, be available.
4. SNs MUST suppress sending Advertise messages in response to
Query messages if one was sent with at least the Query's TTL
within the last SolicitPeriod.
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 13
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
After multicasting a Query message, a node MUST wait for an
interval, BetweenQuery, before sending another Query message.
Nodes SHOULD suppress sending Query messages for
BetweenQuery seconds when they first start in order to
collect information from Advertise messages already
solicited by other nodes.
5. Getting a successful Advertise message via ERS, the Node
will try to connect to the parent by sending BindRequest
messages, which are described in Section 5.
The following variables are used in the Expanding Ring Search
algorithm.
- TTLNeighborInit: This is the initial TTL value to be used by the
ERS if no other TTL value is specified by the algorithm.
- TTLIncrement: This is the periodic increment for the TTL used in
ERS.
- TTLMax: This is a configured maximum TTL value to be used by
either Query or Advertise messages.
- TTLSession: This is the session TTL value for the multicast
session.
- SolicitPeriod: Each receiver MUST not send more than one QUERY
message per SolicitPeriod. When SN's responds to QUERY messages,
it also suppresses its ADVERTISE message if one has been sent
less than SolicitPeriod ago. This parameter is used to control
the amount of control traffic during tree construction if ERS is
used.
- BetweenQuery: This is the time interval a node must wait before
sending successive Query messages.
- MaxBindAttempts: This variable is an integer used to control how
many times a receiver tries to bind to a SN before giving up and
try another one.
- BindPeriod: In order to prevent loops, sometimes a SN must reject
a BindRequest (for example, when the SN is not on the tree yet
and has a BindRequest outstanding itself) from a receiver. In
this case, if the receiver needs to retry binding to the same SN
again (perhaps because the receiver does not discover any
alternative SN's), then it must wait for BindPeriod seconds.
4.1.3 Mesh
The mesh approach relies on a set of SN's already deployed as
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 14
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
infrastructure servers. These SN's are on-line, but are not
necessarily aware of any particular session unless informed by the
following mechanisms.
SN's in the mesh are configured to know who their neighbor SN's are,
and exchange reachability information with their neighbors in a way
analogous to routers in a network. The actually protocol used by
SN's to exchange such reachability information is outside the scope
of this BB. (In principle, a routing protocol such as shortest-
path-first, or a link-state-protocol can be adapted for this
purpose). Instead, this BB specifies the following properties that
the mesh of SN's MUST satisfy:
a) Each SN knows a subset of SN's in the mesh as its immediate
neighbors.
b) Each SN has a "forwarding table", such that given any other
(destination) SN in the mesh, the forwarding table gives a
"next-hop" SN that can be used to reach the destination SN,
plus the distance to the destination SN from the local SN.
c) A given SN in the mesh can "broadcast" information to all
other SN's in the mesh (in the sense of having a means of
sending the same information to all other SN's, but not
necessarily simultaneously).
d) All potential sender and receivers of a multicast session
can discover a "neighboring" SN in the mesh, using the
neighbor discovery mechanisms described in Section 4.1.
The reason for running a routing-like algorithm to maintain the
forwarding tables in each SN is to provide fault tolerance when
some SN's in the mesh fail. When that happens, the remaining SN's
exchange information with each other to update the forwarding
tables. In steady state, the mesh of SN's must still satisfy the
above 4 properties.
In the simplest form, each SN in the mesh has a forwarding table
that contains all other SN's in the mesh. This is called a fully-
connected mesh.
As mentioned earlier, the Mesh scheme assumes that there is a pre-
deployed infrastructures of SN servers. That is, the SN-SN bindings
and Sender-SN (Sender's SNs) will be performed internally by the
provider's own policy. The only thing done by Sender is to inform
its SN's that a session starts. The relationships between Sender
and its SN's (i.e., Sender-SN bindings) MUST also be pre-
configured.
For example, the internal bindings between Sender and SN's MAY be
done as follows:
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 15
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
1. The sender locates a neighbor SN in the mesh by a pre-
configured mechanism. This SN is referred to as the sender's
SN.
2. The sender sends the multicast session id, address and port
(all these can be set as a abbreviated session announcement
message) to the sender's SN.
3. The sender's SN in turn "broadcasts" the session information
to all SN's in the mesh; since SN's can support multiple
sessions simultaneously, they keep the information about
each session in an entry in a session table.
4. After the Sender-SN bindings, Sender will multicasts its
session announcement to the multicast receivers. Then, the
sender's SN binds to the sender by sending a BindRequest
message.
During the internal processings of Sender-SN binding described
until now, any Query and Advertise messages are not used.
When a session starts, the bindings between SN and Receivers will
be done. The tree binding of SN-receiver is done with the Tree
Configurator (TC), which was used in the Static algorithm. The
main difference between Static algorithm with TC and Mesh
algorithm with TC is that the active SNs are considered as
candidate SNs in the Static scheme, while the pre-deployed
SNs are considered as candidates in the Mesh scheme. That is, in
the Mesh scheme, the TC is required to already get the information
on the locations of the pre-deployed SNs.
Then the tree buildings between receivers and SNs are done as
follows:
1. When a session starts, a receiver sends a Query message to
the TC. The receiver may optionally discovers its distance
from the Sender. Any metric described in Section 4.2 may be
used.
2. The TC chooses one or more candidate parents SNs for the
receiver from the pre-deployed SNs by its own tree
configuration rule, as described in Section 4.1.1.
3. TC MUST respond to the Query message with an Advertise
message, which include the candidate parent list. In the list,
each entry contains the corresponding IP address and port of
the parent.
4. Receiving a successful Advertise message from the TC, the
Node will try to connect to its parent by sending BindRequest
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 16
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
messages based on the candidate parent list, as described in
Section 5.
Once a receiver is connected to a parent SN, the SN-SN bindings
will also be done internally by the pre-configured provider's
policy. For example, each SN in the mesh tries to bind to its
"next-hop" SN. If the "next-hop" SN is not reachable for some
reason, an SN may also try to bind to any neighbor SN as a back-up
alternative. These procedures will be devised to ensure that the
loop freedom is guaranteed in section 5.
4.2. Distance Measurement
Different techniques can be used to determine distances between
nodes in a session. The distances of interest are:
Sender distance - the distance from a SN to sender
Neighbor distance - the distance from a receiver to a
neighboring SN
These distances can be used in selecting a Service Node ifseveral
are discovered.
These techniques quantify distance differently. Each specific way
of quantifying distance is called a metric. Different Metrics are
not necessarily comparable. For example, if distance between A
and B is X using metric m1, and distance between A and C is Y
using metric m2, then X > Y does not necessarily imply B is
farther from A than C.
Only distances of the same metric should be compared and ranked.
Therefore, a receiver SHOULD only rank two SN's based on their
respective sender distance if those distances are based on the same
metric.
On the other hand, it is not necessary for all receivers to use the
same metric to select theirneighboring SN to connect to. Suppose
receivers use neighbor distance as a selection criterion. One
receiver may determine neighbor distances to SN's based on hop
count, whereas another receiver may determine neighbor distances
to its neighboring SN's based on delay.
4.2.1 TTL Hop-Count
If this metric is used, a node periodically sends a Beacon message
on the Session's multicast address. The Beacon message includes
the original time-to-live value set by the node. The distance to
the node is then calculated as
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 17
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
Beacon's original TTL - Beacon's current TTL
One node is closer than another if its distance is a lower number.
Note that the TTL value may not be available in some implementation
environments.
4.2.2 Number of Levels
One metric a receiver can use for SN selection is the number of
levels the SN is from the sender. For example, given two SN's in
close proximity to a receiver, if one SN is n levels from the
sender and the other is m levels, where m<n, the receiver SHOULD
select the SN with m levels. This is because a shallower tree
allows faster propagation of feedback information to the sender.
(Note, we assumed the choice is between two SN's equally close to
the receiver. The receiver to SN distance is another
consideration).
The number of levels metric is not generally available, as the tree
may be constructed bottom up. If the mesh approach is used,
however, the distance in the SN's forwarding tables can be
implemented as an estimate of the number of levels from the sender.
4.2.3 Delay
Another metric is the delay from an SN to the sender. If the SN is
directly connected to the sender, then the delay would simply be
the time to send feedback from the SN to the sender. If the SN is
several levels down from the sender, then the delay would be the
sum of the delays for each level (with some jitter time added in
each level). For example, given two SN's in close proximity to a
receiver, the receiver SHOULD select the SN with a smaller delay to
the sender. This is again for the purpose of minimizing the
feedback time.
If the tree is built bottom up, this metric cannot be used. If the
mesh approach is used, this metric can be implemented, although it
requires the SN's in the mesh to exchange distance information
based on the delay metric.
4.2.4 Address
For IPv6 addresses[HOD98], distance can be approximately determined
by the number of aggregation levels one address has in common with
another. For this metric, one node is closer than another if its
address has more aggregation levels in common with the querying
node's address.
4.2.5 Static
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 18
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
The node's distance to other nodes may be made available in some
well-known location. One node's is closer than another if its
distance is a lower number.
4.2.6 GRA
The node's distance to the sender may be determined with help of a
GRA message that lists the set of GRA routers on the path from the
source.
4.3 Service Node Selection
Once Neighbors have been discovered, a node selects the best one
using whatever distance information is available.
If there is no sender distance information to compare, the best SN
is simply the one that is closest to the node, without a loop
being formed by binding the node to the SN.
If sender distances are available, there are two cases:
Leaf nodes: For leaf nodes, the goal is to use the closest SN
possible.
SNs: For SNs joining the tree, it is important to pick an SN
that is closer to the sender; neighbor distance is a
secondary factor.
Once an SN has been selected, the node tries to bind to it as
described in Section 5. Loop prevention is done during the bind
process using only Tree Level information.
This algorithm is recommended because it assigns each node the
closest SN, and does not require all nodes to measure their sender
distance at the start of the session. Depending on the selected
metric, multiple nodes measuring sender distance could cause
message implosion, and delay tree construction. On the other hand,
the SNs selected may actually be further from the Sender than
their children are. However, it may be necessary to assign nodes
to non-optimal SNs in order to get them on the tree, since it is
possible that no SN closer to the Sender can accept any more
children.
Alternatively, nodes may be required to measure sender distance
before selecting an SN in order to ensure that each parent is
closer to the Sender than its children. Presumably, this results
in a tree in which parents detect message loss before their
children, minimizing repair requests.
5. Tree Formation
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 19
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
The following is a detailed description of the tree formation
process. All tree construction follows this pattern. The ONLY
differences between instantiations of this building block lie in
how nodes discover and select neighbors.
Once an SN has been selected, the Node sends a BindRequest message
to the SN. If the SN has an outstanding request to bind to another
SN, it must refuse the incoming bind request in case it would form
a loop.
Otherwise, it MAY accept the Node as a child as long as selecting
it would not cause a loop in the tree. Loop freedom is guaranteed
by these two rules:
1. If the requesting node does not have children, the SN can
accept it as a child as long as the SN has no outstanding
bind requests. If it does have an outstanding bind request,
the SN can accept the node as a child if its address is less
than the child's address.
2. If the requesting node has children, the SN can accept it as
a child if
a. the SN's level is 128, i.e., it is the top of a sub-tree not
yet connected to the Sender, or
b. the SN's level is less than 128, i.e., it is connected to the
Sender.
The second rule prevents a node from selecting one of its own
children as its parent. Two nodes at level 128 are prevented from
selecting each other using tie-breaking criteria described step 1
above.
If the SN accepts the Node as a Child, it returns a BindConfirm
message. If it does not accept the Node, it sends a BindReject
message. If the Node does not receive a response after
MaxBindAttempts tries every BindPeriod seconds, it MAY select the
next best Neighbor from its cached list, or else run the Service
Node Discovery process again to determine an alternate SN to try.
The BindReject message contains a reason code. If the code
indicates that the node was rejected because the SN was not yet on
the tree, the node MAY choose to retry that SN after BindPeriod
seconds, or select a different available SN.
The BindReject message may also include a list of alternative SNs
for the node to try.
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 20
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
The BindConfirm message MUST include the Parent's current Tree
Level. The Node MUST set its Tree Level to one more than the
Parent's level.
The BindConfirm message also MUST also indicate the starting
sequence number of the message from which data reliability is
assured. This information is included in the BindConfirm message
to enable receivers to meet the PI's late join requirements. If
nodes join the tree after the Sender has started to send data, it
is possible that some of the data is no longer available within
the tree. Nodes may need to have specific information about repair
availability before selecting a Parent.
Service Nodes MAY limit the number of children they support
depending on their capacity. Once an SN has accepted its maximum
number of children, it stops accepting new children until a change
in membership causes its count of children to go below this limit.
If an SN limits the number of children it supports, it MUST reserve
at least one child slot for other SNs. This guarantees the growth
of the repair tree.
6. Tree Maintenance
Tree maintenance is an ongoing process active in every Node.
Because the tree is based on the operation of SNs, as well as the
various underlying metrics that may change over time, it is
important that these dependencies be monitored for changes. Nodes
MUST monitor Parents for liveness and changes in tree level, and
SHOULD continue to run the Neighbor Discovery and Selection
process in order to optimize their choice of SN. Parents must
also monitor Children for liveness.
6.1 Monitoring Parents and Children
The upper Building Block or Protocol Instantiation is responsible
for monitoring parents and children. Monitoring messages from
parents to children MUST contain the Parent's current Tree Level.
Children MUST set their Tree Level to one more than their Parent's
level.
If a Child loses contact with its Parent for a period of time, it
MUST report it using the lostSN interface, and attempt to bind with
an alternate SN.
A Child that is leaving a session MUST send a unicast UnbindRequest
message to its Parent. The Parent MUST respond with an
UnbindConfirm message.
A Parent that is leaving the session MUST send an EjectRequest
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 21
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
message to its Children indicating that they need to bind with an
alternate SN. If possible the EjectRequest message is multicasted,
but the EjectRequest message can also be sent via unicast to each
child individually. Upon receiving an EjectRequest message from
its parent, a receiver sets its Tree Level to 128 again. Using
the heartbeat mechanism, the Tree Level for all receivers in the
affected subtree will be updated (to a value higher than 128).
If a Parent does not hear from a Child for a period of time, or it
receives a UnbindRequest message from a Child, it removes that
Child from its list of Children, and reports the loss using the
removeChild interface.
6.2 Optimizing the Tree
Implementations of this building block SHOULD continue to run the
Neighbor Discovery and Selection process in order to optimize the
choice of SN. This continuous process also keeps the distance
information for the current Parent up-to-date. Whenever the
process returns a better SN than the current one, the Node MAY
bind to the new SN. Once the new SN is bound to, the Node MUST
send a UnbindRequest message to the original Parent. A Parent
with no Children MAY leave the session.
7. Messages
These messages are required for implementations of this building
block. The list below indicates which message contents are
required
by implementations. Implementations may also include other
protocol-specific information in these messages. Note that these
messages are parts of packets specified in PI's that use this BB.
+----------------+-----------------------------+---------------------+
| Message Name | Description | |
| m- or u-cast | | Contents |
+----------------+-----------------------------+---------------------+
| Query | A message used to discover | Sender distance |
| both | neighbors. The unicast | (optional), |
| | is sent to TC; the multicast| TTL (m-cast only) |
| | is used in ERS. | |
+----------------+-----------------------------+---------------------+
| Advertise | A message used to advertise | IP address, port, |
| both | an SN. The unicast is sent | Sender distance |
| | from the TC; the multicast | (optional) |
| | is sent by SNs themselves | |
+----------------+-----------------------------+---------------------+
| BindRequest | Request to SN to join tree | Current Tree Level |
| unicast | | |
+----------------+-----------------------------+---------------------+
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 22
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
| BindConfirm | SN accepts BindRequest | Current Tree Level |
| unicast | | |
+----------------+-----------------------------+---------------------+
| BindReject | SN rejects BindRequest | Reject reason, |
| unicast | | alternate SN list |
+----------------+-----------------------------+---------------------+
| UnbindRequest | Child leaving Parent(u-cast)| Unbind reason |
| unicast | Parent leaving tree (m-cast)| |
+----------------+-----------------------------+---------------------+
| UnbindConfirm | Acknowledgement of | |
| unicast | UnbindRequest message | |
+----------------+-----------------------------+---------------------+
| EjectRequest | Parent refusing or leaving | Eject reason |
| both | service to Children | |
+----------------+-----------------------------+---------------------+
| EjectConfirm | Acknowledgement of | |
| unicast | EjectRequest message | |
+----------------+-----------------------------+---------------------+
8. External Interfaces
This section describes external interfaces for the building block.
8.1 Interfaces to this BB.
These may be used by a PI, or by a higher-level BB.
8.1.1 start(boolean SN, advertisement)
start notifies the BB to begin operation. If the SN parameter is
set to TRUE, the BB also starts SN operation.
8.1.2 end()
end notifies the BB to end operation.
8.1.3 incomingMessage(message)
This interface is used to pass an incoming message down from the PI.
8.1.4 getStatistics
getStatistics returns current BB statistics to the upper BB or PI.
8.1.5 getSNs
getSNs instructs the BB to start the process of finding SN
candidates for this node. getSNs may return immediately with a
list of
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 23
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
candidate SN, or may use the SNlist interface (see section 8.2.1)
to return the list at a later time.
8.1.6 setSN(SN)
setSN informs the BB that the PI or upper BB has successfully bound
to an SN.
8.1.7 acceptChild(node)
acceptChild asks the BB to accept or reject the node as a member.
The BB returns a boolean value in response.
8.1.8 removeChild(node)
removeChild is called to inform the BB that the child is no longer
a member.
8.1.9 treeLevelUpdate(newLevel)
This interface is used to pass down any update to the node's tree
level that the upper BB or the PI has learned. newLevel replaces
the BB's current tree level.
8.1.10 lostSN
lostSN notifies the BB that the connection to the SN was lost.
8.1.11 setOptimization(boolean)
setOptimization tells the BB to start or stop the tree optimization
process. The upper BB or PI may want to control when tree
optimization takes place.
8.1.12 recordSNs(destination)
recordSNs tells the BB to record the current SN, plus the list of
alternates, to the indicated destination.
8.2 Interfaces from this BB to the PI or a higher-level BB.
8.2.1 outgoingMessage(message)
outgoingMessage instructs the PI to send the message.
8.2.2 SNlist(list)
SNlist returns to the upper PI or BB a list of SN candidates. The
list may contain additional information for each candidate, such as
details about the data packets that it has available for repair.
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 24
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
9. References
[DEE89] Deering, S., "Host Extensions for IP Multicasting", RFC
1112.
[ECTP02] ITU-T Q.8/17 and ISO/IEC JTC1/SC6/WG7, "Enhanced
Communication Transport Protocol: Specification of Simplex
Multicast Transport," ITU-T Recommendation X.606 and ISO/IEC
14476-1, Feb., 2002.
[HC00] H. Holbrook, B. Cain, "Source-Specific Multicast for IP,"
Internet Draft, Internet Engineering Task Force, November 2000.
[HOD98] R. Hinden, M. O'Dell, S. Deering, "An IPv6 Aggregatable
Global Unicast Address Format," Request For Comments 2374, July
1998.
[HPW00] M. Handley, C. Perkins, E. Whelan, "Session Announcement
Protocol", Internet Draft, Internet Engineering Task Force, March
2000.
[HWKFV00] M. Handley, B. Whetten, R. Kermode, S. Floyd, L. Vicisano,
and M. Luby, "The Reliable Multicast Design Space for Bulk Data
Transfer," RFC 2887, August 2000.
[K01] S. Koh, et. al., "Configuration of ACK Tress for Multicast
Transport Protocols," ETRI Journal, Vol. 23, No. 3, September 2001.
[KV02] R. Kermode, L. Vicisano, "Author Guidelines for RMT Building
Blocks and Protocol Instantiation Documents," RFC 3269, April 2002.
[LPG98] B.N. Levine, S. Paul, and J.J. Garcia-Luna-Aceves,
"Organizing Multicast Receivers Deterministically According to
Packet-Loss Correlation," Proc. Sixth ACM International Multimedia
Conference (ACM Multimedia 98), Bristol, UK, September 1998.
[PTM93] C. Partridge, T. Mendez, and W. Milliken. "Host anycasting
service." Request For Comments 1546, November 1993.
[WCMKT02] B. Whetten, D. Chiu, M. Kadansky, S. Koh, G. Taskale,
"TRACK Protocol Instantiation over UDP: Tree Acknowledgement based
Reliable Multicast," Internet Draft, November 2002.
[WLKHFL01] B. Whetten, L. Vicisano, R. Kermode, M. Handley, S.
Floyd, and M. Luby, "Reliable Multicast Transport Building Blocks
for One-to-Many Bulk-Data Transfer," RFC 3048, January 2001.
[YGS95] R. Yavatkar, J. Griffioen and M. Sudan, "A Reliable
Dissemination Protocol for Interactive Collaborative Applications",
University of Kentucky, 1995.
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 25
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
[SV02] T. Speakman, and L. Vicisano, "Reliable Multicast Transport
Building Block: Generic Router Assist - Signalling Protocol
Specification," Internet Draft, Internet Engineering Task Force,
July 2002.
Authors' Addresses
Brian Whetten
brian@whetten.net
890 Sea Island Lane
Foster City, CA 94404
Dah Ming Chiu
dahming.chiu@sun.com
Miriam Kadansky
miriam.kadansky@sun.com
Sun Microsystems Laboratories
1 Network Drive
Burlington, MA 01803
Seok Joo Koh
sjkoh@etri.re.kr
ETRI/Korea
161 Kajong-dong
Yusong-Gu, TAEJON, 305-350, KOREA
Gursel Taskale
gursel@tibco.com
Brian Neil Levine
brian@cs.umass.edu
Full Copyright Statement
Copyright (C) The Internet Society (2001). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet languages other than English.
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 26
INTERNET DRAFT draft-ietf-rmt-bb-tree-config-03.txt November 2002
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
INTERNET DRAFT draft-ietf-rmt-bb-track-01.txt 27
| PAFTECH AB 2003-2026 | 2026-04-22 23:21:45 |