One document matched: draft-ietf-p2psip-diagnostics-22.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC0792 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.0792.xml">
<!ENTITY RFC3688 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3688.xml">
<!ENTITY RFC5226 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5226.xml">
<!ENTITY RFC5905 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5905.xml">
<!ENTITY RFC6940 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6940.xml">
<!ENTITY RFC7263 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7263.xml">
<!ENTITY I-D.ietf-p2psip-concepts PUBLIC "" "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-p2psip-concepts.xml">
<!-- This isn't referenced so removed it -->
<!-- <!ENTITY I-D.ietf-p2psip-self-tuning PUBLIC "" -->
<!-- "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-p2psip-self-tuning.xml"> -->
]>
<rfc category="std" docName="draft-ietf-p2psip-diagnostics-22"
ipr="trust200902" submissionType="IETF" updates="" xml:lang="">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="no"?>
<?rfc iprnotified="yes" ?>
<?rfc strict="no" ?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<front>
<title abbrev="P2P Overlay Diagnostics">P2P Overlay Diagnostics</title>
<author fullname="Haibin Song" initials="H." surname="Song">
<organization>Huawei</organization>
<address>
<email>haibin.song@huawei.com</email>
</address>
</author>
<author fullname="Jiang Xingfeng" initials="X." surname="Jiang">
<organization>Huawei</organization>
<address>
<email>jiangxingfeng@huawei.com</email>
</address>
</author>
<author fullname="Roni Even" initials="R" surname="Even">
<organization>Huawei</organization>
<address>
<postal>
<street>14 David Hamelech</street>
<city>Tel Aviv 64953</city>
<country>Israel</country>
</postal>
<email>ron.even.tlv@gmail.com</email>
</address>
</author>
<author fullname="David A. Bryan" initials="D. A." surname="Bryan">
<organization>ethernot.org</organization>
<address>
<postal>
<street>Cedar Park, Texas</street>
<country>United States of America</country>
</postal>
<email>dbryan@ethernot.org</email>
</address>
</author>
<author fullname="Yi Sun" initials="Y" surname="Sun">
<organization>ICT</organization>
<address>
<email>sunyi@ict.ac.cn</email>
</address>
</author>
<date day="24" month="March" year="2016" />
<area>Real-time Applications and Infrastructure</area>
<workgroup>P2PSIP Working Group</workgroup>
<keyword>Diagnostics</keyword>
<keyword>P2P</keyword>
<keyword>P2PSIP</keyword>
<abstract>
<t>This document describes mechanisms for P2P overlay diagnostics. It
defines extensions to the RELOAD base protocol to collect diagnostic
information, and details the protocol specifications for these
extensions. Useful diagnostic information for connection and node status
monitoring is also defined. The document also describes the usage
scenarios and provides examples of how these methods are used to perform
diagnostics.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>In the last few years, overlay networks have rapidly evolved and
emerged as a promising platform for deployment of new applications and
services in the Internet. One of the reasons overlay networks are seen
as an excellent platform for large scale distributed systems is their
resilience in the presence of failures. This resilience has three
aspects: data replication, routing recovery, and static resilience.
Routing recovery algorithms are used to repopulate the routing table
with live nodes when failures are detected. Static resilience measures
the extent to which an overlay can route around failures even before the
recovery algorithm repairs the routing table. Both routing recovery and
static resilience rely on accurate and timely detection of failures.</t>
<t>There are a number of situations in which some nodes in a Peer-to-
Peer (P2P) overlay may malfunction or behave badly. For example, these
nodes may be disabled, congested, or may be misrouting messages. The
impact of these malfunctions on the overlay network may be a degradation
of quality of service provided collectively by the peers in the overlay
network or an interruption of the overlay services. It is desirable to
identify malfunctioning or badly behaving peers through diagnostic
tools, and exclude or reject them from the P2P system. Node failures may
also be caused by failures of underlying layers. For example, recovery
from an incorrect overlay topology may be slow when the speed at which
IP routing recovers after link failures is very slow. Moreover, if a
backbone link fails and the failover is slow, the network may be
partitioned, leading to partitions of overlay topologies and
inconsistent routing results between different partitioned
components.</t>
<t>Some keep-alive algorithms based on periodic probe and acknowledge
mechanisms enable accurate and timely detection of failures of one
node's neighbors <xref target="Overlay-Failure-Detection"></xref>, but
these algorithms by themselves can only detect the disabled neighbors
using the periodic method. This may not be sufficient for the service
provider operating the overlay network.</t>
<t>A P2P overlay diagnostic framework supporting periodic and on-demand
methods for detecting node failures and network failures is desirable.
This document describes a general P2P overlay diagnostic extension to
the base protocol RELOAD <xref target="RFC6940"> </xref> and is intended
as a complement to keep-alive algorithms in the P2P overlay itself.
Readers are advised to consult <xref
target="I-D.ietf-p2psip-concepts"></xref> for further background on the
problem domain.</t>
</section>
<section title="Terminology" toc="default">
<t>This document uses the concepts defined in <xref target="RFC6940">
RELOAD</xref>.</t>
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"></xref>.</t>
</section>
<section title="Diagnostic Scenarios">
<t>P2P systems are self-organizing and ideally setup and configuration
of individual P2P nodes requires no network management in the
traditional sense. However, users of an overlay, as well as P2P service
providers may contemplate usage scenarios where some monitoring and
diagnostics are required. We present a simple connectivity test and some
useful diagnostic information that may be used in such diagnostics.</t>
<t>The common usage scenarios for P2P diagnostics can be broadly
categorized in three classes:<list style="letters">
<t>Automatic diagnostics built into the P2P overlay routing
protocol. Nodes perform periodic checks of known neighbors and
remove those nodes from the routing tables that fail to respond to
connectivity checks <xref target="Handling_Churn_in_a_DHT"></xref>.
Unresponsive nodes may only be temporarily disabled, for example due
to a local cryptographic processing overload, disk processing
overload or link overload. It is therefore useful to repeat the
connectivity checks to see nodes have recovered and can be again
placed in the routing tables. This process is known as 'failed node
recovery' and can be optimized as described in the paper <xref
target="Handling_Churn_in_a_DHT">"Handling Churn in a
DHT"</xref>.</t>
<t>Diagnostics used by a particular node to follow up on an
individual user complaint or failure. For example, a technical
support staff member may use a desktop sharing application (with the
permission of the user) to remotely determine the health of, and
possible problems with, the malfunctioning node. Part of the remote
diagnostics may consist of simple connectivity tests with other
nodes in the P2P overlay and retrieval of statistics from nodes in
the overlay. The simple connectivity tests are not dependent on the
type of P2P overlay. Note that other tests may be required as well,
including checking the health and performance of the user's computer
or mobile device and checking the bandwidth of the link connecting
the user to the Internet.</t>
<t>P2P system-wide diagnostics used to check the overall health of
the P2P overlay network. These include checking the consumption of
network bandwidth, checking for the presence of problem links and
checking for abusive or malicious nodes. This is not a trivial
problem and has been studied in detail for content and streaming P2P
overlays <xref target="Diagnostic_Framework"></xref>, and has not
been addressed in earlier documents <xref
target="Diagnostics_and_NAT_traversal_in_P2PP"></xref>. While this
is a difficult problem, a great deal of information that can help in
diagnosing these problems can be obtained by obtaining basic
diagnostic information for peers and the network. This document
provides a framework for obtaining this information.</t>
</list></t>
</section>
<section title="Data Collection Mechanisms">
<section title="Overview of Operations" toc="default">
<t>The diagnostic mechanisms described in this document are primarily
intended to detect and locate failures or monitor performance in P2P
overlay networks. It provides mechanisms to detect and locate
malfunctioning or badly behaving nodes including disabled nodes,
congested nodes and misrouting peers. It provides a mechanism to
detect direct connectivity or connectivity to a specified node, a
mechanism to detect the availability of specified resource records and
a mechanism to discover P2P overlay topology and the underlay topology
failures.</t>
<t>The RELOAD diagnostics extensions define two mechanisms to collect
data. The first is an extension to the RELOAD Ping mechanism, allowing
diagnostic data to be queried from a node, as well as to diagnose the
path to that node. The second is a new method, PathTrack, for
collecting diagnostic information iteratively. Payloads for these
mechanisms allowing diagnostic data to be collected and represented
are presented, and additional error codes are introduced. Essentially,
this document reuses RELOAD <xref target="RFC6940">
</xref>specification and extends them to introduce the new diagnostics
methods. The extensions strictly follow how RELOAD specifies message
routing, transport, NAT traversal, and other RELOAD protocol
features.</t>
<t>This document primarily describes how to detect and locate failures
including disabled nodes, congested nodes, misrouting behaviors and
underlying network faults in P2P overlay networks through a simple and
efficient mechanism. This mechanism is modeled after the
ping/traceroute paradigm: ping <xref target="RFC0792"> </xref> is used
for connectivity checks, and traceroute is used for hop-by-hop fault
localization as well as path tracing. This document specifies a
"ping-like" mode (by extending the RELOAD Ping method to gather
diagnostics) and a "traceroute-like" mode (by defining the new
PathTrack method) for diagnosing P2P overlay networks.</t>
<t>One way these tools can be used is to detect the connectivity to
the specified node or the availability of the specified
resource-record through the extended Ping operation. Once the overlay
network receives some alarms about overlay service degradation or
interruption, a Ping is sent. If the Ping fails, one can then send a
PathTrack to determine where the fault lies.</t>
<t>The diagnostic information can only be provided to authorized
nodes. Some diagnostic information can be provided to all the
participants in the P2P overlay, and some other diagnostic information
can only be provided to the nodes authorized by the local or overlay
policy. The authorization depends on the type of the diagnostic
information and the administrative considerations, and is application
specific.</t>
<!--Please review this paragraph which was added by me.-->
<t>This document considers the general administrative scenario based
on diagnostic Kind, where a whole overlay can authorize a certain kind
of diagnostic information to a small list of particular nodes (e.g.
administrative nodes). That means, if a node gets the authorization to
access a diagnostic Kind, it can access that information from all
nodes in the overlay network. It leaves the scenario where a
particular node authorizes its diagnostic information to a particular
list of nodes out of scope. This could be achieved by extension of
this document if there is requirement in the near future. The default
policy or access rule for a type of diagnostic information is "deny"
unless specified in the diagnostics extension document. As the RELOAD
protocol already requires that each message carries the message
signature of the sender, the receiver of the diagnostics requests can
use the signature to identify the sender. It can then use the overlay
configuration file with this signature to determine which types of
diagnostic information that node is authorized for.</t>
<t>In the remainder of this section we define mechanisms for
collecting data, as well as the specific protocol extensions (message
extensions, new methods, and error codes) required to collect this
information. In Section 5 we discuss the format of the data collected,
and in Section 6 we discuss detailed message processing.</t>
<t>It is important to note that the mechanisms described in this
document do not guarantee that the information collected is in fact
related to the previous failures. However, using the information from
previous traversed nodes, the user (or management system) may be able
to infer the problem. Symmetric routing can be achieved by using the
Via List <xref target="RFC6940"></xref> (or an alternate DHT routing
algorithm), but the response path is not guaranteed to be the
same.</t>
</section>
<section title=""Ping-like" Behavior: Extending Ping">
<t>To provide "ping-like" behavior, the RELOAD Ping method is extended
to collect diagnostic data along the path. The request message is
forwarded by the intermediate peers along the path and then terminated
by the responsible peer. After optional local diagnostics, the
responsible peer returns a response message. If an error is found when
routing, an Error response is sent to the initiator node by the
intermediate peer.</t>
<t>The message flow of a Ping message (with diagnostic extensions) is
as follows:</t>
<figure align="center" title="Figure 1: Ping Diagnostic Message Flow">
<artwork align="center" name="Figure 1">
Peer A Peer B Peer C Peer D
| | | |
|(1). PingReq | | |
|------------------->|(2). PingReq | |
| |------------------->|(3). PingReq |
| | |------------------->|
| | | |
| | |<-------------------|
| |<-------------------|(4). PingAns |
|<-------------------|(5). PingAns | |
|(6). PingAns | | |
| | | |
</artwork>
</figure>
<section anchor="ping_ext" title="RELOAD Request Extension: Ping">
<t>To extend the ping request for use in diagnostics, a new
extension of RELOAD is defined. The structure for a MessageExtension
in RELOAD is defined as:</t>
<figure align="left">
<artwork>
struct {
MessageExtensionType type;
Boolean critical;
opaque extension_contents<0..2^32-1>;
} MessageExtension;
</artwork>
</figure>
<t>For the Ping request extension, we define a new
MessageExtensionType, extension 0x0002 named Diagnostic_Ping, as
specified in <xref target="table_extcodes"></xref>. The extension
contents consists of a DiagnosticsRequest structure, defined later
in this document in <xref target="DiagReqDataStruc"></xref>. This
extension MAY be used for new requests of the Ping method and MUST
NOT be included in requests using any other method.</t>
<t>This extension is not critical. If a peer does not support the
extension, they will simply ignore the diagnostic portion of the
message, and will treat the message as if it was a normal ping.
Senders MUST accept a response that lacks diagnostic information and
SHOULD NOT resend the message expecting a reply. Receivers who
receive a method other than Ping including this extension MUST
ignore the extension.</t>
</section>
</section>
<section title=""Traceroute-like" Behavior: The Path_Track Method">
<t>We define a simple PathTrack method for retrieving diagnostic
information iteratively.</t>
<t>The operation of this request is shown below in Figure 2. The
initiator node A asks its neighbor B which is the next hop peer to the
destination ID, and B returns a message with the next hop peer C
information, along with optional diagnostic information for B to the
initiator node. Then the initiator node A asks the next hop peer C
(direct response routing <xref target="RFC7263"></xref> or via
symmetric routing) to return next hop peer D information and
diagnostic information of C. Unless a failure prevents the message
from being forwarded, this step can be iteratively repeated until the
request reaches responsible peer D for the destination ID, and
retrieves diagnostic information of peer D.</t>
<t>The message flow of a PathTrack message (with diagnostic
extensions) is as follows:</t>
<figure align="center"
title="Figure 2: PathTrack Diagnostic Message Flow">
<artwork>
Peer-A Peer-B Peer-C Peer-D
| | | |
|(1).PathTrackReq | | |
|------------------->| | |
|(2).PathTrackAns | | |
|<-------------------| | |
| |(3).PathTrackReq | |
|--------------------|------------------->| |
| |(4).PathTrackAns | |
|<-------------------|--------------------| |
| | |(5).PathTrackReq |
|--------------------|--------------------|------------------->|
| | |(6).PathTrackAns |
|<-------------------|--------------------|--------------------|
| | | |
</artwork>
</figure>
<t>There have been proposals that RouteQuery and a series of Fetch
requests can be used to replace the PathTrack mechanism, but in the
presence of high rates of churn, such an operation would not, strictly
speaking, provide identical results, as the path may change between
RouteQuery and Fetch operations. While obviously the path could change
between steps of PathTrack as well, with a single message rather than
two messages for query and fetch, less inconsistency is likely, and
thus the use of a single message is preferred.</t>
<t>Given that in a typical diagnostic scenario the peer sending the
PathTrack request desires to obtain information about the current path
to the destination, in the event that succesive calls to PathTrack
return different paths, the results should be discarded and the
request resent, ensuring that the second request traverses the
appropriate path.</t>
<!-- The implication of route change is, the
result of the diagnostics cannot be trusted with one hundered percent,
which is only used to infer the cause with high
probability.</t>
-->
<section anchor="Path_Track" title="New RELOAD Request: PathTrack">
<t>This document defines a new RELOAD method, PathTrack, to retrieve
the diagnostic information from the intermediate peers along the
routing path. At each step of the PathTrack request, the responsible
peer responds to the initiator node with requested status
information. Status information can include a peer's congestion
state, processing power, available bandwidth, the number of entries
in its neighbor table, uptime, identity, network address
information, and next hop peer information.</t>
<t>A PathTrack request specifies which diagnostic information is
requested using a DiagnosticsRequest data structure, defined and
discussed in detail later in this document in <xref
target="DiagReqDataStruc"></xref>. Base information is requested by
setting the appropriate flags in the data structure in the request.
If all flags are clear (no bits are set), then the PathTrack request
is only used for requesting the next hop information. In this case
the iterative mode of PathTrack is degraded to a RouteQuery method
which is only used for checking the liveness of the peers along the
routing path. The PathTrack request can be routed using direct
response routing or other routing methods chosen by the initiator
node.</t>
<t>A response to a successful PathTrackReq is a PathTrackAns
message. The PathTrackAns contains general diagnostic information in
the payload, returned using a DiagnosticResponse data structure.
This data structure is defined and discussed in detail later in this
document in <xref target="DRDS"></xref>. The information returned is
determined based on the information requested in the flags in the
corresponding request.</t>
<section title="PathTrack Request">
<t>The structure of the PathTrack request is as follows:<figure>
<artwork align="center" xml:space="preserve">
struct{
Destination destination;
DiagnosticsRequest request;
}PathTrackReq;</artwork>
</figure></t>
<t>The fields of the PathTrackReq are as follows:<list
style="hanging">
<!--Haibin added one example here for interpretation purpose.-->
<t>destination : The destination which the initiator node is
interested in. This may be any valid destination object,
including a NodeID, opaque ids, or ResourceID. One example
should be noted that, for debugging purpose, the initiator
will use the destination ID as it was used when failure
happened.</t>
<t>request : A DiagnosticsRequest, as discussed in <xref
target="DiagReqDataStruc"></xref>.</t>
</list></t>
</section>
<section anchor="Path_track_response" title="PathTrack Response">
<t>The structure of the PathTrack response is as follows:<figure>
<artwork align="center" xml:space="preserve">
struct{
Destination next_hop;
DiagnosticsResponse response;
}PathTrackAns;</artwork>
</figure></t>
<t>The fields of the PathTrackAns are as follows:<list
style="hanging">
<!--Haibin added some consideration here.-->
<t>next_hop : The information of the next hop node from the
responding intermediate peer to the destination. If the
responding peer is the responsible peer for the destination
ID, then the next_hop node ID equals the responding node ID,
and after receiving a PathTrackAns where the next_hop node ID
equals the responding node ID the initiator MUST stop the
iterative process.</t>
<t>response : A DiagnosticsResponse, as discussed in <xref
target="DRDS"></xref>.</t>
</list></t>
</section>
</section>
</section>
<section anchor="sec_err_codes" title="Error Code Extensions">
<t>This document extends the Error response method defined in the
RELOAD specification to support error cases resulting from diagnostic
queries. When an error is encountered in RELOAD, the Message Code
0xFFFF is returned. The ErrorResponse structure includes an error
code. We define new error codes to report possible error conditions
detected while performing diagnostics:</t>
<figure align="left">
<artwork>
Code Value Error Code Name
TBD1 Underlay Destination Unreachable
TBD2 Underlay Time exceeded
TBD3 Message Expired
TBD4 Upstream Misrouting
TBD5 Loop detected
TBD6 TTL hops exceeded</artwork>
</figure>
<!--Haibin added some interpretation text into the paragraph below.-->
<t>The final error codes will be assigned by IANA as specified in
<xref target="RFC6940">RELOAD protocol </xref>. The error code is
returned by the upstreaming node before the failure node. And the
upstreaming node uses the normal ping to detect the failure type and
return it to the initiator node, which will help the user (initiator
node) to understand where the failure happened and what kind of error
happened, as the failure may happen at the same location and for the
same reason when sending the normal message and the diagnostics
message.</t>
<!-- DB: I believe Allisa's comment here was that we can't specify the
codes, they are application dependednt as specified in
RELOAD. I changed text to reflect that and commented out the
other part. -->
<t>As defined in RELOAD, additional information may be stored (in an
implementation-specific way) in the optional error_info byte string.
While the specifics are obviously left to the implementation, as an
example, in the case of TBD1, the error_field could be used to provide
additional information as to why the underlay destination is
unreachable (net unreachable, host unreachable, fragmentation needed,
etc.)</t>
<!--
Here are some examples of errors that might be expressed using the
error_info field in the case of Code [TBD1]:</t>
<figure align="left">
<artwork>
error_info:
net unreachable
host unreachable
protocol unreachable
port unreachable
fragmentation needed
source route failed
</artwork>
</figure>
<t>The error_info field values of the Code [TBD2] to [TBD6] are to be
application specific and defined by the particular overlay.</t>
-->
</section>
</section>
<section anchor="DiagDataStruc" title="Diagnostic Data Structures">
<t>Both the extended Ping method and PathTrack method use the following
common diagnostics data structures to collect data. Two common
structures are defined: DiagnosticsRequest for requesting data, and
DiagnosticsResponse for returning the information.</t>
<section anchor="DiagReqDataStruc"
title="DiagnosticsRequest Data Structure">
<t>The DiagnosticsRequest data structure is used to request diagnostic
information and has the following form:</t>
<figure align="left">
<artwork xml:space="preserve">
enum{ (2^16-1) } DiagnosticKindId;
struct{
DiagnosticKindId kind;
opaque diagnostic_extension_contents<0..2^32-1>;
}DiagnosticExtension;
struct{
uint64 expiration;
uint64 timestamp_initiated;
uint64 dMFlags;
uint32 ext_length;
DiagnosticExtension diagnostic_extensions_list<0..2^32-1>;
}DiagnosticsRequest;
</artwork>
</figure>
<t>The fields in the DiagnosticsRequest are as follows:<list
style="hanging">
<!--Here Haibin added a sentence to explain that expiration field is mainly used for preventing replay attacks.-->
<t>expiration : The time when the request will expire represented
as the number of milliseconds elapsed since midnight Jan 1, 1970
UTC not counting leap seconds. This will have the same values for
seconds as standard UNIX time or POSIX time. More information can
be found at <xref target="UnixTime">UnixTime</xref>. This value
MUST have a value of between 1 and 600 seconds in the future. This
value is used to prevent replay attacks.</t>
<t>timestamp_initiated : The time when the diagnostics request was
initiated represented as the number of milliseconds elapsed since
midnight Jan 1, 1970 UTC not counting leap seconds. This will have
the same values for seconds as standard UNIX time or POSIX
time.</t>
<t>dMFlags : A mandatory field which is an unsigned 64-bit integer
indicating which base diagnostic information the request initiator
node is interested in. The initiator sets different bits to
retrieve different kinds of diagnostic information. If dMFlags is
set to zero, then no base diagnostic information is conveyed in
the PathTrack response. If dMFlag is set to all '1's, then all
base diagnostic information values are requested. A request may
set any number of the flags to request the corresponding
diagnostic information.</t>
<t anchor="FIX">Note this memo specifies the initial set of flags,
the flags can be extended. The dMflags indicate general diagnostic
information The mapping between the bits in the dMFlags and the
diagnostic information kind presented is as described in <xref
target="IANADFLG"></xref>.</t>
<t>ext_length : the length of the extended diagnostic request
information in bytes. If the value is greater than or equal to 1,
then some extended diagnostic information is being requested, on
the assumption this information will be included in the response
if the recipient understands the extended request and is willing
to provide it. The specific diagnostic information requested is
defined in the diagnostic_extensions_list below. A value of zero
indicates no extended diagnostic information is being requested.
The value of ext_length MUST NOT be negative. Note that it is not
the length of the entire DiagnosticsRequest data structure, but of
the data making up the diagnostic_extensions_list.</t>
<t>diagnostic_extensions_list : consists of one or more
DiagnosticExtension structures (see below) documenting additional
diagnostic information being requested. Each DiagnosticExtension
consists of the following fields: <list>
<t>kind : a numerical code indicating the type of extension
diagnostic information (see <xref target="IANADKT"></xref>).
Note that kinds 0xF000 - 0xFFFE are reserved for overlay
specific diagnostics and may be used without IANA registration
for local diagnostic information. Kinds from 0x0000 to 0x003F
MUST NOT be indicated in the diagnostic_extensions_list in the
message request, as they may be represented using the dMFlags
in a much simpler (and more space efficient) way.</t>
<t>diagnostic_extension_contents : the opaque data containing
the request for this particular extension. This data is
extension dependent.</t>
</list></t>
</list></t>
</section>
<section anchor="DRDS" title="DiagnosticsResponse Data Structure">
<figure>
<artwork xml:space="preserve">
enum { (2^16-1) } DiagnosticKindId;
struct{
DiagnosticKindId kind;
opaque diagnostic_info_contents<0..2^16-1>;
}DiagnosticInfo;
struct{
uint64 expiration;
uint64 timestamp_initiated;
uint64 timestamp_received;
uint8 hop_counter;
uint32 ext_length;
DiagnosticInfo diagnostic_info_list<0..2^32-1>;
}DiagnosticsResponse;
</artwork>
</figure>
<t>The fields in the DiagnosticsResponse are as follows:<list
style="hanging">
<t>expiration : The time when the response will expire represented
as the number of milliseconds elapsed since midnight Jan 1, 1970
UTC not counting leap seconds. This will have the same values for
seconds as standard UNIX time or POSIX time. This value MUST have
a value of between 1 and 600 seconds in the future.</t>
<!--Haibin added this timestamp_initiated value to response so that the initiator does not have to keep the state.-->
<t>timestamp_initiated: This value is copied from the diagnostics
request message. The benefit of containing such a value in the
response message is that the initiator node does not have to
maintain the state.</t>
<t>timestamp_received : The time when the diagnostic request was
received represented as the number of milliseconds elapsed since
midnight Jan 1, 1970 UTC not counting leap seconds. This will have
the same values for seconds as standard UNIX time or POSIX
time.</t>
<t>hop_counter : This field only appears in diagnostic responses.
It MUST be exactly copied from the TTL field of the forwarding
header in the received request. This information is sent back to
the request initiator, allowing it to compute the number of hops
that the message traversed in the overlay.</t>
<!-- added a bit more to clarify in response to Alissa's
questions -->
<t>ext_length : the length of the returned DiagnosticInfo
information in bytes. If the value is greater than or equal to 1,
then some extended diagnostic information (as specified in the
DiagnosticsRequest) was available and is being returned. In that
case, this value indicates the length of the returned information.
A value of zero indicates no extended diagnostic information is
included, either because none was requested or the request could
not be accommodated. The value of ext_length MUST NOT be negative.
Note that it is not the length of the entire DiagnosticsRequest
data structure, but of the data making up the
diagnostic_info_list.</t>
<t>diagnostic_info_list : consists of one or more DiagnosticInfo
structures containing the requested diagnostic_info_contents. The
fields in the DiagnosticInfo structure are as follows:<list
style="hanging">
<t>kind : A numeric code indicating the type of information
being returned. For base data requested using the dMFlags,
this code corresponds to the dMFlag set, and is described in
<xref target="DiagReqDataStruc"></xref>. For diagnostic
extensions, this code will be identical to the value of the
DiagnosticKindId set in the "kind" field of the
DiagnosticExtension of the request. See <xref
target="IANADKT"></xref>.</t>
<t>diagnostic_info_contents : Data containing the value for
the diagnostic information being reported. Various kinds of
diagnostic information can be retrieved, Please refer to <xref
target="diag_information"></xref> for details of the
diagnostic Kind ID for the base diagnostic information that
may be reported.</t>
</list></t>
</list></t>
</section>
<section anchor="diag_information"
title="dMFlags and Diagnostic Kind ID Types">
<t>The dMFlags field described above is a 64 bit field that allows
initiator nodes to identify up to 62 items of base information to
request in a request message (the first and last flags being
reserved). The dMFlags also reserves all "0"s that means nothing is
requested, and all "1"s that means everything is requested. But at the
same time, the first and last bits cannot be used for other purposes,
and they MUST be set to 0 when other particular diagnostic information
kinds are requested. When the requested base information is returned
in the response, the value of the diagnostic Kind ID will correspond
to the numeric field marked in the dMFlags in the request. The values
for the dMFlags are defined in <xref target="IANADFLG"></xref> and the
diagnostic Kind IDs are defined in <xref target="IANADKT"></xref>. The
information contained for each value is described in this section.
Access to each kind of diagnostic information MUST NOT be allowed
unless compliant to the rules defined in <xref
target="authorization"></xref>.<list>
<t>STATUS_INFO (8 bits):A single value element containing an
unsigned byte representing whether or not the node is in
congestion status. An example usage of STATUS_INFO is for
congestion-aware routing. In this scenario, each peer has to
update its congestion status periodically. An intermediate peer in
the distributed hash table (DHT) network will choose its next hop
according to both the DHT routing algorithm and the status
information. This is done to avoid increasing load on congested
peers. The rightmost 4 bits are used and other bits MUST be
cleared to "0"s for future use. There are 16 levels of congestion
status, with "0x00" represent zero load and "0x0F" represent
congested. This document does not provide a specific method for
congestion, leaving this decision to each overlay implementation.
One possible option for an overlay implementation would be to take
node's CPU/memory/bandwidth usage percentage in the past 600
seconds and normalize the highest value to the range from 0x00 to
0x0F. And an overlay implementation can also decide to not use all
that 16 values from 0x00 to 0x0F. A future draft may define an
objective measure or specific algorithm for this.</t>
<t>ROUTING_TABLE_SIZE (32 bits): A single value element containing
an unsigned 32-bit integer representing the number of peers in the
peer's routing table. The administrator of the overlay may be
interested in statistics of this value for reasons such as routing
efficiency.</t>
<t>PROCESS_POWER (64 bits): A single value element containing an
unsigned 64-bit integer specifying the processing power of the
node in unit of MIPS. Fractional values are rounded up.</t>
<t>UPSTREAM_BANDWIDTH (64 bits): A single value element containing
an unsigned 64-bit integer specifying the upstream network
bandwidth (provisioned or maximum, not available) of the node in
unit of Kbps. Fractional values are rounded up. For multihomed
hosts, this should be the link used to send the response.</t>
<t>DOWNSTREAM_BANDWIDTH (64 bits): A single value element
containing an unsigned 64-bit integer specifying the downstream
network bandwidth (provisioned or maximum, not available) of the
node in unit of Kbps. Fractional values are rounded up. For
multihomed hosts, this should be the link the request was received
from.</t>
<t>SOFTWARE_VERSION: A single value element containing a US-ASCII
string that identifies the manufacture, model, operating system
information and the version of the software. Given that there are
very large number of peers in some networks, and no peer is likely
to know all other peer's software, this information may be very
useful to help determine if the cause of certain groups of
misbehaving peers is related to specific software versions. While
the format is peer-defined, a suggested format is as follows:
"ApplicationProductToken (Platform; OS-or-CPU) VendorProductToken
(VendorComment)". For example: "MyReloadApp/1.0 (Unix; Linux
x86_64) libreload-java/0.7.0 (Stonyfish Inc.)". The string is a
C-style string, and MUST be terminated by "\0"."\0" MUST NOT be
included in the string itself to prevent confusion with the
delimiter.</t>
<t>MACHINE_UPTIME (64 bits): A single value element containing an
unsigned 64-bit integer specifying the time the node's underlying
system has been up in seconds.</t>
<t>APP_UPTIME (64 bits): A single value element containing an
unsigned 64-bit integer specifying the time the P2P application
has been up in seconds.</t>
<t>MEMORY_FOOTPRINT (64 bits): A single value element containing
an unsigned 64-bit integer representing the memory footprint of
the peer program in kilobytes (1024 bytes). Fractional values are
rounded up.</t>
<t>DATASIZE_STORED (64 bits): An unsigned 64-bit integer
representing the number of bytes of data being stored by this
node.</t>
<t>INSTANCES_STORED: An array element containing the number of
instances of each kind stored. The array is indexed by Kind-ID.
Each entry is an unsigned 64-bit integer.</t>
<t>MESSAGES_SENT_RCVD: An array element containing the number of
messages sent and received. The array is indexed by method code.
Each entry in the array is a pair of unsigned 64-bit integers
(packed end to end) representing sent and received.</t>
<t>EWMA_BYTES_SENT (32 bits): A single value element containing an
unsigned 32-bit integer representing an exponential weighted
average of bytes sent per second by this peer. sent = alpha x
sent_present + (1 - alpha) x sent_last where sent_present
represents the bytes sent per second since the last calculation
and sent_last represents the last calculation of bytes sent per
second. A suitable value for alpha is 0.8 (the implementation can
decide other suitable value for this). This value is calculated
every five seconds (the implementation can also decide other
length of the time period). The value for the very first time
period should simply be the average of bytes sent in that time
period.</t>
<t>EWMA_BYTES_RCVD (32 bits): A single value element containing an
unsigned 32-bit integer representing an exponential weighted
average of bytes received per second by this peer. rcvd = alpha x
rcvd_present + (1 - alpha) x rcvd_last where rcvd_present
represents the bytes received per second since the last
calculation and rcvd_last represents the last calculation of bytes
received per second. A suitable value for alpha is 0.8 (the
implementation can decide other suitable value for this). This
value is calculated every five seconds (the implementation can
also decide other length of the time period). The value for the
very first time period should simply be the average of bytes
received in that time period.</t>
<t>UNDERLAY_HOP (8 bits): Indicates the IP layer hops from the
intermediate peer which receives the diagnostics message to the
next hop peer for this message. (Note: RELOAD does not require the
intermediate peers to look into the message body. So here we use
PathTrack to gather underlay hops for diagnostics purpose).</t>
<t>BATTERY_STATUS (8 bits): The left-most bit is used to indicate
whether this peer is using a battery or not. If this bit is clear
(set to '0'), then the peer is using a battery for power. The
other 7 bits are to be determined by specific applications.</t>
</list></t>
</section>
</section>
<section title="Message Processing">
<section title="Message Creation and Transmission">
<t>When constructing either a Ping message with diagnostic extensions
or a PathTrack message, the sender first creates and populates a
DiagnosticsRequest data structure. The timestamp_initiated field is
set to the current time, and the expiration field is constructed based
on this time. The sender includes the dMFlags field in the structure,
setting any number (including all) of the flags to request particular
diagnostic information. The sender MAY leave all the bits unset,
requesting no particular diagnostic information.</t>
<t>The sender MAY also include diagnostic extensions in the
DiagnosticsRequest data structure to request additional information.
If the sender includes any extensions, it MUST calculate the length of
these extensions and set the ext_length field to this value. If no
extensions are included, the sender MUST set ext_length to zero.</t>
<t>The format of the DiagnosticRequest data structure and its fields
MUST follow the restrictions defined in <xref
target="DiagReqDataStruc"></xref>.</t>
<t>When constructing a Ping message with diagnostic extensions, the
sender MUST create an MessageExtension structure as defined in RELOAD
<xref target="RFC6940"></xref>, setting the value of type to 0x0002,
and the value of critical to FALSE. The value of extension_contents
MUST be a DiagnosticsRequest structure as defined above. The message
MAY be directed to a particular NodeId or ResourceID, but MUST NOT be
sent to the broadcast NodeID.</t>
<!-- <t>Editors note: RELOAD appears to be broken right now. To allow
for multiple extensions and allow peers that don't understand the
extension to process it properly, there needs to be a length in
the MessageContents structure. Right now, the message appears
like it couldn't be parsed without knowing the extension.</t> -->
<t>When constructing a PathTrack message, the sender MUST set the
message_code for the RELOAD MessageContents structure to
path_track_req TBD7. The request field of the PathTrackReq MUST be set
to the DiagnosticsRequest data structure defined above. The
destination field MUST be set to the desired destination, which MAY be
either a NodeId or ResourceID but SHOULD NOT be the broadcast
NodeID.</t>
</section>
<section anchor="Message_Processing_Intermediate_Peers"
title="Message Processing: Intermediate Peers">
<t>When a request arrives at a peer, if the peer's responsible ID
space does not cover the destination ID of the request, then the peer
MUST continue processing this request according to the overlay
specified routing mode from RELOAD protocol.</t>
<t>In P2P overlay, error responses to a message can be generated by
either an intermediate peer or the responsible peer. When a request is
received at a peer, the peer may find connectivity failures or
malfunctioning peers through the pre-defined rules of the overlay
network, e.g. by analyzing via list or underlay error messages. In
this case, the intermediate peer returns an error response to the
initiator node, reporting any malfunction node information available
in the error message payload. All error responses generated MUST
contain the appropriate error code.</t>
<t>Each intermediate peer receiving a Ping message with extensions
(and which understands the extension) or receiving a PathTrack
request/ response MUST check the expiration value (Unix time format)
to determine if the message is expired. If the message expired, the
intermediate peer MUST generate a response with Error Code TBD3
"Message Expired", return the response to the initiator node, and
discard the message.</t>
<t>The intermediate peer MUST return an error response with the Error
Code TBD1 "Underlay Destination Unreachable" when it receives an ICMP
message with "Destination Unreachable" information after forwarding
the received request to the destination peer.</t>
<t>The intermediate peer MUST return an error response with the Error
Code TBD2 "Underlay Time Exceeded" when it receives an ICMP message
with "Time Exceeded" information after forwarding the received
request.</t>
<t>The peer MUST return an Error response with Error Code TBD4
"Upstream Misrouting" when it finds its upstream peer disobeys the
routing rules defined in the overlay. The immediate upstream peer
information MUST also be conveyed to the initiator node.</t>
<t>The peer MUST return an Error response with Error Code TBD5 "Loop
detected" when it finds a loop through the analysis of via list.</t>
<t>The peer MUST return an Error response with Error Code TBD6 "TTL
hops exceeded" when it finds that the TTL field value is no more than
0 when forwarding.</t>
<!--
<t>With PathTrack, if a former PathTrack message does not arrive
at the destination, then the following PathTrack request must copy
the next_hop field in the former response into the forwarding header
and keep the destination_ID unchanged.</t>
<t>Ping is also used to detect possible failures in the specified
path of P2P overlay network. If disabled peers, misrouting
behavior and underlying network faults are detected during the
routing process, the Error responses with Error codes and
descriptions, must be sent to the initiator node immediately.</t>
-->
</section>
<section anchor="Message_response" title="Message Response Creation">
<t>When a diagnostic request message arrives at a peer, it is
responsible for the destination ID specified in the forwarding header,
and assuming it understands the extension (in the case of Ping) or the
new request type PathTrack, it MUST follow the specifications defined
in RELOAD to form the response header, and perform the following
operations:</t>
<t>When constructing a PathTrack response, the sender MUST set the
message_code for the RELOAD MessageContents structure to
path_track_ans TBD8.</t>
<t>The receiver MUST check the expiration value (Unix time format) in
the DiagnosticsRequest to determine if the message is expired. If the
message is expired, the peer MUST generate a response with the Error
Code TBD3 "Message Expired", return the response to the initiator
node, and discard the message.</t>
<t>If the message is not expired, the receiver MUST construct a
DiagnosticsResponse structure, as follows: The TTL value from the
forwarding header is copied to the hop_counter field of the
DiagnosticsResponse structure. Note that the default value for TTL at
the beginning represents 100-hops unless overlay configuration has
overridden the value. The receiver generates an Unix time format
timestamp for the current time of day and places it in the
timestamp_received field, and constructs a new expiration time and
places it in the expiration field of the DiagnosticsResponse.</t>
<t>The destination peer MUST check if the initiator node has the
authority to request specific types of diagnostic information, and if
appropriate, append the diagnostic information requested in the
dMFlags and diagnostic_extensions (if any) using the
diagnostic_info_list field to the DiagnosticsResponse structure. If
any information returned, the receiver MUST calculate the length of
the response and set ext_length appropriately. If no diagnostic
information is returned, ext_length MUST be set to zero.</t>
<t>The format of the DiagnosticResponse data structure and its fields
MUST follow the restrictions defined in <xref
target="DRDS"></xref>.</t>
<t>In the event of an error, an error response containing the error
code followed by the description (if they exist) MUST be created and
sent to the sender. If the initiator node asks for diagnostic
information that they are not authorized to query, the receiving peer
MUST return an Error response with the Error Code 2
"Error_Forbidden".</t>
</section>
<section title="Interpreting Results">
<t>The initiator node, as well as the responding peer, may compute the
overlay One-Way-Delay time through the value in timestamp_received and
the timestamp_initiated field. However, for a single hop measurement,
the traditional measurement methods (IP layer ping) MUST be used
instead of the overlay layer diagnostics methods.</t>
<t>The P2P overlay network using the diagnostics methods specified in
this document MUST enforce time synchronization with a central time
server. Network Time Protocol <xref target="RFC5905"></xref> can
usually maintain time to within tens of milliseconds over the public
Internet, and can achieve better than one millisecond accuracy in
local area networks under ideal conditions. However, this document
does not specify the choice for time resolution and synchronization,
leaving it to the implementation.</t>
<t>The initiator node receiving the Ping response may check the
hop_counter field and compute the overlay hops to the destination peer
for the statistics of connectivity quality from the perspective of
overlay hops.</t>
</section>
</section>
<!--Please review the following two sections I added for namespace and authorization.-->
<section anchor="authorization"
title="Authorization through Overlay Configuration">
<t>Different level of access control can be made for different
users/nodes. For example, diagnostic information A can be accessed by
node 1 and 2, but diagnostic information B can only be accessed by node
2.</t>
<t>The overlay configuration file MUST contain the following XML
elements for authorizing a node to access the relative diagnostic
Kinds.</t>
<t>diagnostic-kind: This has the attribute "kind" with the hexadecimal
number indicating the diagnostic Kind ID, this attribute has the same
value with <xref target="IANADKT"></xref>, and at least one sub element
"access-node".</t>
<t>access-node: This element contains one hexadecimal number indicating
a NodeID, and the node with this NodeID is allowed to access the
diagnostic "kind" under the same diagnostic-kind element.</t>
</section>
<section title="Security Considerations">
<t>The authorization for diagnostic information must be designed with
care to prevent it becoming a method to retrieve information for bot
attacks. It should also be noted that attackers can use diagnostics to
analyze overlay information to attack certain key peers. For example,
diagnostic information might be used to fingerprint a peer where the
peer will loose its anonymity characteristics, but anonymity might be
very important for some P2P overlay networks, and defenses against such
fingerprinting are probably very hard. As such, networks where anonymity
is of very high importance may find implementation of diagnostics
problematic or even undesirable, despite the many advantages it offers.
As this document is a RELOAD extension, it follows RELOAD message header
and routing specifications, the common security considerations described
in the base document <xref target="RFC6940"></xref> are also applicable
to this document. Overlays may define their own requirements on who can
collect/share diagnostic information.</t>
</section>
<section anchor="IANA" title="IANA Considerations">
<section anchor="IANADFLG" title="Diagnostics Flag">
<t>IANA is asked to create a "RELOAD Diagnostics Flag" Registry under
protocol RELOAD. Entries in this registry are 1-bit flags contained in
a 64-bits long integer dMFlags denoting diagnostic information to be
retrieved as described in <xref target="Path_Track"></xref>. New
entries SHALL be defined via <xref target="RFC5226"></xref> Standards
Action. The initial contents of this registry are:<figure>
<artwork>
+-------------------------+----------------------------+----------+
| diagnostic information |diagnostic flag in dMFlags | RFC |
|-------------------------+----------------------------+----------|
|Reserved All 0s value | 0x 0000 0000 0000 0000 |RFC-[TBDX]|
|Reserved First Bit | 0x 0000 0000 0000 0001 |RFC-[TBDX]|
|STATUS_INFO | 0x 0000 0000 0000 0002 |RFC-[TBDX]|
|ROUTING_TABLE_SIZE | 0x 0000 0000 0000 0004 |RFC-[TBDX]|
|PROCESS_POWER | 0x 0000 0000 0000 0008 |RFC-[TBDX]|
|UPSTREAM_BANDWIDTH | 0x 0000 0000 0000 0010 |RFC-[TBDX]|
|DOWNSTREAM_ BANDWIDTH | 0x 0000 0000 0000 0020 |RFC-[TBDX]|
|SOFTWARE_VERSION | 0x 0000 0000 0000 0040 |RFC-[TBDX]|
|MACHINE_UPTIME | 0x 0000 0000 0000 0080 |RFC-[TBDX]|
|APP_UPTIME | 0x 0000 0000 0000 0100 |RFC-[TBDX]|
|MEMORY_FOOTPRINT | 0x 0000 0000 0000 0200 |RFC-[TBDX]|
|DATASIZE_STORED | 0x 0000 0000 0000 0400 |RFC-[TBDX]|
|INSTANCES_STORED | 0x 0000 0000 0000 0800 |RFC-[TBDX]|
|MESSAGES_SENT_RCVD | 0x 0000 0000 0000 1000 |RFC-[TBDX]|
|EWMA_BYTES_SENT | 0x 0000 0000 0000 2000 |RFC-[TBDX]|
|EWMA_BYTES_RCVD | 0x 0000 0000 0000 4000 |RFC-[TBDX]|
|UNDERLAY_HOP | 0x 0000 0000 0000 8000 |RFC-[TBDX]|
|BATTERY_STATUS | 0x 0000 0000 0001 0000 |RFC-[TBDX]|
|Reserved Last Bit | 0x 8000 0000 0000 0000 |RFC-[TBDX]|
|Reserved All 1s value | 0x FFFF FFFF FFFF FFFF |RFC-[TBDX]|
+-------------------------+----------------------------+----------+
</artwork>
</figure></t>
<t>[To RFC editor: Please replace all RFC-[TBDX] in this document with
the RFC number of this document.]</t>
</section>
<section anchor="IANADKT" title="Diagnostic Kind ID">
<t>IANA is asked to create a "RELOAD Diagnostic Kind ID" Registry
under protocol RELOAD. Entries in this registry are 16-bit integers
denoting diagnostics extension data kinds carried in the diagnostic
request and response message, as described in <xref
target="DRDS"></xref>. Code points from 0x0001 to 0x003E are asked to
be assigned together with flags within "RELOAD Diagnostics Flag"
registry via RFC 5226 <xref target="RFC5226"></xref> standards action.
Code points in the range 0x003F to 0xEFFF SHALL be registered via RFC
5226 standards action.</t>
<texttable anchor="table_diagkindcodes" title="Diagnostic Kind">
<ttcol align="center">Diagnostic Kind</ttcol>
<ttcol align="center">Code</ttcol>
<ttcol align="center">Specification</ttcol>
<c>reserved</c>
<c>0x0000</c>
<c>RFC-[TBDX]</c>
<c>STATUS_INFO</c>
<c>0x0001</c>
<c>RFC-[TBDX]</c>
<c>ROUTING_TABLE_SIZE</c>
<c>0x0002</c>
<c>RFC-[TBDX]</c>
<c>PROCESS_POWER</c>
<c>0x0003</c>
<c>RFC-[TBDX]</c>
<c>UPSTREAM_BANDWIDTH</c>
<c>0x0004</c>
<c>RFC-[TBDX]</c>
<c>DOWNSTREAM_BANDWIDTH</c>
<c>0x0005</c>
<c>RFC-[TBDX]</c>
<c>SOFTWARE_VERSION</c>
<c>0x0006</c>
<c>RFC-[TBDX]</c>
<c>MACHINE_UPTIME</c>
<c>0x0007</c>
<c>RFC-[TBDX]</c>
<c>APP_UPTIME</c>
<c>0x0008</c>
<c>RFC-[TBDX]</c>
<c>MEMORY_FOOTPRINT</c>
<c>0x0009</c>
<c>RFC-[TBDX]</c>
<c>DATASIZE_STORED</c>
<c>0x000A</c>
<c>RFC-[TBDX]</c>
<c>INSTANCES_STORED</c>
<c>0x000B</c>
<c>RFC-[TBDX]</c>
<c>MESSAGES_SENT_RCVD</c>
<c>0x000C</c>
<c>RFC-[TBDX]</c>
<c>EWMA_BYTES_SENT</c>
<c>0x000D</c>
<c>RFC-[TBDX]</c>
<c>EWMA_BYTES_RCVD</c>
<c>0x000E</c>
<c>RFC-[TBDX]</c>
<c>UNDERLAY_HOP</c>
<c>0x000F</c>
<c>RFC-[TBDX]</c>
<c>BATTERY_STATUS</c>
<c>0x0010</c>
<c>RFC-[TBDX]</c>
<c>reserved for future flags</c>
<c>0x0011-3E</c>
<c>RFC-[TBDX]</c>
<c>local use (reserved)</c>
<c>0xF000-0xFFFE</c>
<c>RFC-[TBDX]</c>
<c>reserved</c>
<c>0xFFFF</c>
<c>RFC-[TBDX]</c>
</texttable>
</section>
<section anchor="IANARMC" title="Message Codes">
<t>This document introduces two new types of messages and their
responses, requiring the following additions to the "RELOAD Message
Code" Registry defined in <xref target="RFC6940">RELOAD</xref>. These
additions are:</t>
<texttable anchor="table_msgcodes"
title="Extensions to RELOAD Message Codes">
<ttcol align="center">Message Code Name</ttcol>
<ttcol align="center">Code Value</ttcol>
<ttcol align="center">RFC</ttcol>
<c>path_track_req</c>
<c>[TBD7]</c>
<c>RFC-AAAA</c>
<c>path_track_ans</c>
<c>[TBD8]</c>
<c>RFC-AAAA</c>
</texttable>
<t>[To RFC editor: Values starting at TBD1 were used to prevent
collisions with RELOAD base values and other extensions. Please
replace with the next highest available values. The final message
codes will be assigned by IANA. And all RFC-AAAA should be replaced
with the RFC number of RELOAD when publication.]</t>
</section>
<section title="Error Code">
<t>This document introduces the following new error codes, extending
the "RELOAD Message Code" registry as described below:</t>
<texttable anchor="table_errcodes"
title="Extensions to RELOAD Error Codes">
<ttcol align="center">Message Code Name</ttcol>
<ttcol align="center">Code Value</ttcol>
<ttcol align="center">RFC</ttcol>
<c>Error_Underlay_Destination_Unreachable</c>
<c>[TBD1]</c>
<c>RFC-AAAA</c>
<c>Error_Underlay_Time_Exceeded</c>
<c>[TBD2]</c>
<c>RFC-AAAA</c>
<c>Error_Message_Expired</c>
<c>[TBD3]</c>
<c>RFC-AAAA</c>
<c>Error_Upstream_Misrouting</c>
<c>[TBD4]</c>
<c>RFC-AAAA</c>
<c>Error_Loop_Detected</c>
<c>[TBD5]</c>
<c>RFC-AAAA</c>
<c>Error_TTL_Hops_Exceeded</c>
<c>[TBD6]</c>
<c>RFC-AAAA</c>
</texttable>
<t>[To RFC editor: Values starting at TBD1 were used to prevent
collisions with RELOAD base values and other extensions. Please
replace with the next highest available values. The final message
codes will be assigned by IANA. And all RFC-AAAA should be replaced
with the RFC number of RELOAD when publication.]</t>
</section>
<section title="Message Extension">
<t>This document introduces the following new RELOAD extension
code:</t>
<texttable anchor="table_extcodes" title="New RELOAD Extension Code">
<ttcol align="center">Extension Name</ttcol>
<ttcol align="center">Code Value</ttcol>
<ttcol align="center">RFC</ttcol>
<c>Diagnostic_Ping</c>
<c>0x0002</c>
<c>RFC-AAAA</c>
</texttable>
<t>[To RFC editor: The value 0x0002 was used to prevent collisions
with other extensions. Please replace with the next highest available
value. The final codes will be assigned by IANA. And all RFC-AAAA
should be replaced with the RFC number of RELOAD when
publication.]</t>
</section>
<section title="XML Name Space Registration">
<t>This document registers a URI for the config-diagnostics XML
namespaces in the IETF XML registry defined in <xref
target="RFC3688"></xref>. All the elements defined in this document
belong to this namespace.<figure align="center">
<artwork>
URI: urn:ietf:params:xml:ns:p2p:config-diagnostics
Registrant Contact: The IESG.
XML: N/A, the requested URIs are XML namespaces</artwork>
</figure></t>
<t>And the overlay configuration file MUST contain the following xml
language declaring P2P diagnostics as a mandatory extension to
RELOAD.<figure align="center">
<artwork>
<mandatory-extension>
urn:ietf:params:xml:ns:p2p:config-diagnostics
</mandatory-extension></artwork>
</figure></t>
</section>
</section>
<section title="Acknowledgments">
<t>We would like to thank Zheng Hewen for the contribution of the
initial version of this document. We would also like to thank Bruce
Lowekamp, Salman Baset, Henning Schulzrinne, Jiang Haifeng and Marc
Petit-Huguenin for the email discussion and their valued comments, and
special thanks to Henry Sinnreich for contributing to the usage
scenarios text. We would like to thank the authors of the RELOAD
protocol for transferring text about diagnostics to this document.</t>
</section>
<!--
<section title="Appendix: Changes Required to use Ping instead of Ping">
<t><list>
<t>1. Addition of a hop_counter mechanism to replicate the behavior
of the current Ping.</t>
</list></t>
</section> -->
</middle>
<back>
<references title="Normative References">
&RFC0792;
&RFC2119;
&RFC3688;
&RFC5226;
&RFC5905;
&RFC6940;
&RFC7263;
</references>
<references title="Informative References ">
<reference anchor="UnixTime"
target="Wikipedia, "Unix Time", <http:/wikipedia.org/wiki/Unix_time>.">
<front>
<title>UnixTime</title>
<author></author>
<date />
</front>
</reference>
<!-- &I-D.ietf-p2psip-self-tuning; -->
&I-D.ietf-p2psip-concepts;
<reference anchor="Overlay-Failure-Detection">
<front>
<title>On failure detection algorithms in overlay networks</title>
<author initials="S" surname="Zhuang">
<organization></organization>
</author>
<date day="13-17" month="Mar" year="2005" />
</front>
<seriesInfo name="" value="Proc. IEEE Infocomm" />
</reference>
<reference anchor="Handling_Churn_in_a_DHT">
<front>
<title>Handling Churn in a DHT</title>
<author initials="S" surname="Rhea">
<organization></organization>
</author>
<date day="" month="June" year="2004" />
</front>
<seriesInfo name="USENIX" value="Annual Conference" />
</reference>
<reference anchor="Diagnostic_Framework">
<front>
<title>A Diagnostic Framework for Peer-to-Peer Streaming</title>
<author initials="X" surname="Jin">
<organization>Hong Kong University and Microsoft</organization>
</author>
<date year="2005" />
</front>
</reference>
<reference anchor="Diagnostics_and_NAT_traversal_in_P2PP" target="">
<front>
<title>Diagnostics and NAT Traversal in P2PP - Design and
Implementation</title>
<author initials="G" surname="Gupta">
<organization></organization>
</author>
<date month="June" year="2008" />
</front>
<seriesInfo name="Columbia University Report" value="" />
</reference>
</references>
<section title="Examples">
<t>Below, we sketch how these metrics can be used.</t>
<section title="Example 1">
<t>A peer may set EWMA_BYTES_SENT and EWMA_BYTES_RCVD flags in the
PathTrackReq to its direct neighbors. A peer can use EWMA_BYTES_SENT
and EWMA_BYTES_RCVD of another peer to infer whether it is acting as a
media relay. It may then choose not to forward any requests for media
relay to this peer. Similarly, among the various candidates for
filling up routing table, a peer may prefer a peer with a large UPTIME
value, small RTT, and small LAST_CONTACT value.</t>
</section>
<section title="Example 2">
<t>A peer may set the STATUS_INFO Flag in the PathTrackReq to a remote
destination peer. The overlay has its own threshold definition for
congestion. The peer can obtain knowledge of all the status
information of the intermediate peers along the path. Then it can
choose other paths to that node for the subsequent requests.</t>
</section>
<section title="Example 3">
<t>A peer may use Ping to evaluate the average overlay hops to other
peers by sending PingReq to a set of random resource or node IDs in
the overlay. A peer may adjust its timeout value according to the
change of average overlay hops.</t>
</section>
</section>
<section title="Problems with Generating Multiple Responses on Path">
<t>An earlier version of this document considered an approach where a
response was generated by each intermediate peer as the message
traversed the overlay. This approach was discarded. One reason this
approach was discarded was that it could provide a DoS mechanism,
whereby an attacker could send an arbitrary message claiming to be from
a spoofed "sender" the real sender wished to attack. As a result of
sending this one message, many messages would be generated and sent back
to the spoofed "sender" - one from each intermediate peer on the message
path. While authentication mechanisms could reduce some risk of this
attack, it still resulted in a fundamental break from the
request-response nature of the RELOAD protocol, as multiple responses
are generated to a single request. Although one request with responses
from all the peers in the route will be more efficient, it was
determined to be too great a security risk and deviation from the RELOAD
architecture.</t>
</section>
<section title="Changes to the Draft">
<t>To RFC editor: This section is to track the changes. Please remove
this section before publication.</t>
<section title="Changes since -00 version">
<t><list style="numbers">
<t>Changed title from "Diagnose P2PSIP Overlay Network" to "P2PSIP
Overlay Diagnostics".</t>
<t>Changed the table of contents. Add a section about message
processing and a section of examples.</t>
<t>Merge diagnostics text from the p2psip base draft -01.</t>
<t>Removed ECHO method for security reasons.</t>
</list></t>
</section>
<section title="Changes since -01 version">
<t><list>
<t>Added BATTERY_STATUS as diagnostic information.</t>
<t>Removed UnderlayTTL test from the Ping method, instead adding
an UNDERLAY_HOP diagnostic information for PathTrack method.</t>
<t>Give some examples for diagnostic information, and give some
editor's notes for further work.</t>
</list></t>
</section>
<section title="Changes since -02 version">
<t>Provided further explanation as to why the base draft Ping in the
current form cannot be used to replace Ping, and why some combination
of methods cannot replace PathTrack.</t>
</section>
<section title="Changes since -03 version">
<t>Modified structure used to share information collected. Both
mechanisms now use a common data structure to convey information.</t>
</section>
<section title="Changes since -04 version">
<t>Updated the authors' addresses and modified the last sentence in
<xref target="Path_track_response">.</xref></t>
</section>
<section title="Changes since -05 version">
<t>Resolve Marc's comments from the mailing list. And define the
details of STATUS_INO.</t>
</section>
<section title="Changes in version -10">
<t>Resolve the authorization issue and other comments (e.g. define
diagnostics as a mandatory extension) from WGLC. And check for the
languages.</t>
</section>
<section title="Changes in version -15">
<t>Changed several diagnostic Kind return values to be 64 bit vs. 32
bit to provide headroom. Split bandwidth into upstream and downstream.
Renamed length in diagnostic request object to ext_length, added
ext_length to response object, and clarified that ext_length is length
of diagnostic info/extensions being returned, not the length of the
object.</t>
<t>Aligned many flags/values with RELOAD by using hex vs decimal
values.</t>
<t>Significant reorganization and edit for readability.</t>
</section>
<section title="Changes in version -20">
<t>Addressed the IESG comments:<list>
<t>(1) this document does not update RFC 6940, but is an
extension</t>
<t>(2) remove "p2psip" from the document, according to Ben and
Benoit's comments</t>
<t>(3) update Roni's email address</t>
<t>(4) re-check the document to make sure that access control
policy is the same</t>
<t>(5) change Trust policy from "pre-5378" to "200902"</t>
<t>(6) adress the EWMA_BYTES_RCVD and EWMA_BYTES_SENT equation
problem rasied by Alisa</t>
<t>(7) replace "IANA SHALL" with "IANA is asked to" according to
Spencer and Barry's concern</t>
<t>(8) replace "SHOULD's with "MUST"s in Section 6.2, change "MAY"
to "may" in Section 6.4 according to Ben's comments</t>
<t>(9) add a paragraph in Section 4.3 to explain this document
does not gurantee the same path fro Path_Track, but only provides
information for analysis, according to the list discussion with
Alvaro</t>
<t>(10) change "directly or via symmetric routing" in Section 4.3
to "direct response routing or via symmetric routing", and give a
reference to direct response routing RFC, according to the list
discussion with Alvaro</t>
<t>(11) change Section 5.3 and 9.1 about the reserved dMFlags bits
issue according to Jari and Alexey's comment</t>
<t>(12) replace "diagnostic kind type" with "diagnostic Kind"</t>
<t>(12) correct other minor editorial issues</t>
</list></t>
</section>
<section title="Changes in version -22">
<t>(1) fix the bugs in IANA section</t>
</section>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 20:34:50 |