One document matched: draft-ietf-homenet-dncp-12.xml


<?xml version='1.0' ?>
<!--
Created:       Mon Nov 18 17:55:22 2013 mstenber

split from draft-ietf-homenet-hncp-03-pre - generic parts

TODO:
- caption + title to figures? (gen-art review)

-->

<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>

<?rfc autobreaks="yes"?>
<?rfc compact="yes"?>
<?rfc strict='yes'?>
<?rfc subcompact="no"?>
<?rfc symrefs="yes"?>
<?rfc toc="yes"?>
<?rfc tocindent="yes"?>

<rfc
    ipr='trust200902'
    docName='draft-ietf-homenet-dncp-12'
    category='std'
    >
  <front>
    <title abbrev="Distributed Node Consensus Protocol">
      Distributed Node Consensus Protocol
    </title>
    <author initials="M" surname="Stenberg" fullname="Markus Stenberg">
      <organization>Independent</organization>
      <address>
        <postal>
          <street/>
          <city>Helsinki</city>
          <code>00930</code>
          <country>Finland</country>
        </postal>
        <email>markus.stenberg@iki.fi</email>
      </address>
    </author>
    <author initials="S" surname="Barth" fullname="Steven Barth">
      <organization>Independent</organization>
      <address>
        <postal>
          <street/>
          <city>Halle</city>
          <code>06114</code>
          <country>Germany</country>
        </postal>
        <email>cyrus@openwrt.org</email>
      </address>
    </author>
    <date month="November" year="2015" />

    <area>Internet</area>
    <workgroup>Homenet Working Group</workgroup>
    <keyword>Homenet</keyword>
    <abstract>

    <t>This document describes the Distributed Node Consensus Protocol
    (DNCP), a generic state synchronization protocol that uses the Trickle
    algorithm and hash trees. DNCP is an abstract protocol, and must be
    combined with a specific profile to make a complete implementable
    protocol.</t>
    </abstract>
  </front>
  <middle>
    <section title="Introduction">

      <t>DNCP is designed to provide a way for each participating node to
      publish a small set of TLV (Type-Length-Value) tuples (at most 64
      KB), and to provide a shared and common view about the data published
      by every currently bidirectionally reachable DNCP node in a network.</t>

      <t>For state synchronization a hash tree is used. It is formed by
      first calculating a hash for the dataset published by each node,
      called node data, and then calculating another hash over those node
      data hashes.  The single resulting hash, called network state hash,
      is transmitted using the <xref target="RFC6206">Trickle
      algorithm</xref> to ensure that all nodes share the same view of the
      current state of the published data within the network. The use of
      Trickle with only short network state hashes sent infrequently (in
      steady state, once the maximum Trickle interval per link or unicast
      connection has been reached) makes DNCP very thrifty when updates
      happen rarely.</t>

      <t>For maintaining liveliness of the topology and the data within it,
      a combination of Trickled network state, keep-alives, and "other"
      means of ensuring reachability are used. The core idea is that if
      every node ensures its peers are present, transitively, the whole
      network state also stays up-to-date.</t>

      <section title="Applicability">
	<t>DNCP is useful for cases like autonomous bootstrapping, discovery
	and negotiation of embedded network devices like routers.
	Furthermore it can be used as a basis to run distributed algorithms
	like <xref target="I-D.ietf-homenet-prefix-assignment" /> or
	usecases as described in <xref target="profile-example"/>. DNCP is
	abstract, which allows it to be tuned to a variety of applications by
	defining profiles. These profiles include choices of:
	<list style="hanging">
		<t hangText="- unicast transport:">datagram or stream oriented protocol
		(e.g., TCP, UDP, SCTP) for generic protocol operation</t>

		<t hangText="- optional transport security:">whether and when to use
		security based on (D)TLS, if supported over the chosen transport</t>

		<t hangText="- optional multicast transport:">multicast-capable
		protocol like UDP allowing autonomous peer discovery or more efficient
		use of multiple access links</t>

		<t hangText="- communication scopes:">either hop-by-hop only relying
		on link-local addressing (e.g., for LANs) or using addresses with
		broader scopes (e.g. over WANs or the internet) relying on an existing
		routing infrastructure or a combination of both (e.g., to exchange
		state between multiple LANs over a WAN or the internet)</t>

		<t hangText="- payloads:">additional specific payloads (e.g., IANA
		standardized, enterprise-specific or private use)</t>

		<t hangText="- extensions:">possible protocol extensions, either as
		predefined in this document or specific for a particular usecase</t>
	</list>

	However, there are certain cases where the protocol as defined in this
	document is a less suitable choice. This list provides an overview while
	the following paragraphs provide more detailed guidance on the individual
	matters.

	<list style="hanging">
		<t hangText="- large amounts of data:">nodes are limited to 64KB of
		published data</t>

		<t hangText="- very dense unicast-only networks:">nodes include
		information about all immediate neighbors as part of their published
		data.</t>

		<t hangText="- predominantly minimal data changes:">Node data is
		always transported as-is, leading to a relatively large transmission
		overhead for changes affecting only a small part of it.</t>

		<t hangText="- frequently changing data:">DNCP with its use of Trickle
		is optimized for the steady state and less efficient otherwise.</t>

		<t hangText="- large amounts of very constrained nodes:">DNCP requires
		each node to store the entirety of the data published by all nodes.</t>

	</list>
	</t>

	<t>The topology of the devices is not limited and automatically discovered.
	When relying on link-local communication exclusively, all links having
	DNCP nodes need to be at least transitively connected by routers running
	the protocol on multiple endpoints in order to form a connected network.
	However, there is no requirement for every device in a physical network to
	run the protocol. Especially if globally scoped addresses are used, DNCP
	peers do not need to be on the same or even neighboring physical links.
	Autonomous discovery features are usually used in local network
	scenario however - with security enabled - DNCP can also be used
	over unsecured public networks. Network size is restricted merely by
	the capabilities of the devices, i.e., each DNCP node needs to be
	able to store the entirety of the data published by all nodes.
	The data associated with each individual node identifier is limited to
	about 64KB in this document, however protocol extensions could be defined
	to mitigate this or other protocol limitations if the need arises.</t>

        <t>DNCP is most suitable for data that changes only infrequently to
        gain the maximum benefit from using Trickle. As the network of
        nodes grows, or the frequency of data changes per node increases,
        Trickle is eventually used less and less and the benefit of using
        DNCP diminishes. In these cases Trickle just provides extra
        complexity within the specification and little added value.</t>

        <t>The suitability of DNCP for a particular application can roughly
        be evaluated by considering the expected average network-wide state
        change interval A_NC_I; it is computed by dividing the mean
        interval at which a node originates a new TLV set by the number
        of participating nodes. If keep-alives are used, A_NC_I is the
        minimum of the computed A_NC_I and the keep-alive interval.

        If A_NC_I is less than the (application-specific) Trickle minimum
        interval, DNCP is most likely unsuitable for the application as
        Trickle will not be utilized most of the time. </t>

        <t>If constant rapid state changes are needed, the preferable
        choice is to use an additional point-to-point channel whose address
        or locator is published using DNCP. Nevertheless, if doing so does
        not raise A_NC_I above the (sensibly chosen) Trickle interval
        parameters for a particular application, using DNCP is probably
        not suitable for the application.</t>

        <t>Another consideration is the size of the published TLV set by a
        node compared to the size of deltas in the TLV set. If the TLV set
        published by a node is very large, and has frequent small changes,
        DNCP as currently specified in this specification may be unsuitable
        as it lacks a delta synchronization scheme to keep implementation
        simple.</t>

        <t>DNCP can be used in networks where only unicast transport is
        available. While DNCP uses the least amount of bandwidth when
        multicast is utilized, even in pure unicast mode, the use
        of Trickle (ideally with k < 2) results in a protocol with an
        exponential backoff timer and fewer transmissions than a simpler
        protocol not using Trickle.</t>

      </section>

    </section>

    <section title="Terminology">

      <texttable suppress-title="true" style="none" align="left">
	<ttcol width="25%" /><ttcol width="75%" />

	<c>DNCP profile</c>

	<c>the values for the set of parameters, given in <xref
	target="profile-bits"/>. They are prefixed with DNCP_ in this
	document. The profile also specifies the set of optional DNCP
	extensions to be used. For a simple example DNCP profile, see <xref
	target="profile-example" />.
        </c>

        <c /><c />

      <c>DNCP-based protocol</c>

      <c>a protocol which provides a DNCP profile, according to <xref
      target="profile-bits"/>, and zero or more TLV assignments from the
      per-DNCP profile TLV registry as well as their processing rules.</c>

      <c /><c />

      <c>DNCP node</c>
      <c>a single node which runs a DNCP-based protocol.</c>

      <c /><c />

      <c>Link</c>
      <c>a link-layer media over which directly connected nodes can
      communicate.</c>

      <c /><c />

      <c>DNCP network</c>

      <c>a set of DNCP nodes running DNCP-based protocol(s) with
      matching DNCP profile(s).

      The set consists of nodes that have discovered each other using the
      transport method defined in the DNCP profile, via multicast
      on local links, and / or by using unicast communication.
      </c>

      <c /><c />

      <c>Node identifier</c>
      <c>an opaque fixed-length identifier consisting of
      DNCP_NODE_IDENTIFIER_LENGTH bytes which uniquely identifies a DNCP
      node within a DNCP network.</c>

      <c /><c />

      <c>Interface</c>
      <c>a node's attachment to a particular link.</c>

      <c /><c />

      <c>Address</c>

      <c>an identifier used as source or destination of a DNCP message flow,
      e.g., a tuple (IPv6 address, UDP port) for an IPv6 UDP transport.</c>

      <c /><c />

      <c>Endpoint</c>

	  <c>a locally configured termination point for (potential or established)
	  DNCP message flows. An endpoint is the source and destination for separate
	  unicast message flows to individual nodes and optionally for multicast
	  messages to all thereby reachable nodes (e.g., for node discovery).

	  Endpoints are usually in one of the transport modes specified in <xref
      target="dt" />.
      </c>

      <c /><c />

      <c>Endpoint identifier</c>

      <c>a 32-bit opaque and locally unique value, which identifies a
      particular endpoint of a particular DNCP node. The value 0 is reserved
      for DNCP and DNCP-based protocol purposes and not used to identify an
      actual endpoint. This definition is in sync with the interface index
      definition in <xref target="RFC3493"/>, as the non-zero small
      positive integers should comfortably fit within 32 bits.</c>

      <c /><c />

      <c>Peer</c>
      <c>another DNCP node with which a DNCP node communicates using at least
      one particular local and remote endpoint pair.</c>

      <c /><c />

      <c>Node data</c>
      <c>a set of TLVs published and owned by a node in the DNCP
      network. Other nodes pass it along as-is, even if they cannot
      fully interpret it.</c>

      <c /><c />

      <c>Origination Time</c>
      <c>the (estimated) time when the node data set with the
      current sequence number was published.</c>

      <c /><c />

      <c>Node state</c>
      <c>a set of metadata attributes for node data. It includes a sequence
      number for versioning, a hash value for comparing equality of stored
      node data, and a timestamp indicating the time passed since its last
      publication (i.e., since the origination time). The hash function and
      the length of the hash value are defined in the DNCP profile.</c>

      <c /><c />

      <c>Network state hash</c>
      <c>a hash value which represents the current state of the network.
      The hash function and the length of the hash value are defined in
      the DNCP profile.

      Whenever a node is added, removed or updates its published node data
      this hash value changes as well.

      For calculation, please see <xref target="hash-tree" />.

      </c>

      <c /><c />

      <c>Trust verdict</c>
      <c>a statement about the trustworthiness of a
      certificate announced by a node participating in the certificate
      based trust consensus mechanism.</c>

      <c /><c />

      <c>Effective trust verdict</c>

      <c>the trust verdict with the highest priority within the set of
      trust verdicts announced for the certificate in the DNCP network.</c>

      <c /><c />

      <c>Topology graph</c>
      <c>the undirected graph of DNCP nodes produced by
      retaining only bidirectional peer relationships between nodes.</c>

      <c /><c />

      <c>Bidirectionally reachable</c>

      <c>a peer is locally unidirectionally reachable if a
      consistent multicast or any unicast DNCP message
      has been received by the local node (see <xref target="peers" />).

      If said peer in return also considers the local node unidirectionally
      reachable, then bidirectionally reachability is established.

      As this process is based on publishing peer relationships and
      evaluating the resulting topology graph as described in <xref
      target="liveliness" />, this information is available to the
      whole DNCP network.</c>

      <c /><c />

      <c>Trickle Instance</c>

      <c>a distinct <xref target="RFC6206">Trickle</xref> algorithm state
      kept by a <xref target="dm">node</xref> and related to an endpoint
      or a particular (peer, endpoint) tuple with Trickle variables I, t
      and c. See <xref target="trickle-updates" />.</c>

      </texttable>


    <section anchor="kwd" title='Requirements Language'>

      <t>
       The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
       NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
       RECOMMENDED",  "MAY", and "OPTIONAL" in this document are to
       be interpreted as described in <xref target='RFC2119'>RFC 2119</xref>.
      </t>

    </section>
    </section>


    <section title="Overview">

      <t>DNCP operates primarily using unicast exchanges between nodes, and
      may use multicast for Trickle-based shared state dissemination and
      topology discovery. If used in pure unicast mode with unreliable
      transport, Trickle is also used between peers.</t>

      <t>DNCP is based on exchanging <xref target="tlvs">TLVs</xref> and
      defines a set of mandatory and optional ones for its operation. They are
      categorized into TLVs for <xref target="tlv-request">requesting
      information</xref>, <xref target="tlv-data">transmitting data</xref> and
      <xref target="tlv-state">being published as data</xref>. DNCP based
      protocols usually specify additional ones to extend the capabilities.</t>

      <t>DNCP discovers the topology of the nodes in the DNCP network and
      maintains the liveliness of published node data by ensuring that the
      publishing node is bidirectionally reachable.
      New potential peers can be discovered autonomously on
      multicast-enabled links, their addresses may be manually configured
      or they may be found by some other means defined in the particular
      DNCP profile. The DNCP profile may specify, for example, a well-known
      anycast address or provisioning the remote address to contact via
      some other protocol such as <xref target="RFC3315">DHCPv6</xref>.</t>

      <t>A hash tree of height 1, rooted in itself, is maintained by each
      node to represent the state of all currently reachable nodes (see
      <xref target="hash-tree" />) and the Trickle algorithm is used to
      trigger synchronization (see <xref target="trickle-updates" />).

      The need to check peer nodes for state changes is thereby determined
      by comparing the current root of their respective hash trees, i.e.,
      their individually calculated network state hashes.</t>

      <t>Before joining a DNCP network, a node starts with a hash tree that
      has only one leaf if the node publishes some TLVs, and no leaves
      otherwise.
      <!-- SB: should it have no leaves or 1 leave based on (the hash of)
           an empty string? -->
      <!-- MSt: 'empty' node state cannot be transmitted on the wire
           currently, so no leaves is probably the correct choice. -->

      It then announces the network state hash calculated from the hash
      tree by means of the Trickle algorithm on all its configured
      endpoints.</t>

      <t>When an update is detected by a node (e.g., by receiving a
      different network state hash from a peer) the originator of the
      event is requested to provide a list of the state of all nodes,
      i.e., all the information it uses to calculate its own hash
      tree.

      The node uses the list to determine whether its own information is
      outdated and - if necessary - requests the actual node data that has
      changed. </t>

      <t>Whenever a node's local copy of any node data and its hash tree are
      updated (e.g., due to its own or another node's node state changing or
      due to a peer being added or removed) its Trickle instances are reset
      which eventually causes any update to be propagated to all of its
      peers.</t>

    </section>

    <section title="Operation">

      <section title="Hash Tree" anchor="hash-tree">

        <t>Each DNCP node maintains an arbitrary width hash tree of height
        1. The root of the tree represents the overall network state hash and
		is used to determine whether the view of the network of two or more
		nodes is consistent and shared. Each leaf represents one
		bidirectionally reachable DNCP node. Every time a node is added or
		removed from the <xref target="liveliness">topology graph</xref>
		it is likewise added or removed as a leaf. At any time the leaves of
		the tree are ordered in ascending order of the node identifiers of the
		nodes they represent.</t>

        <section title="Calculating network state and node data hashes">
        	<t>The network state hash and the node data hashes are calculated
        	using the hash function defined in the <xref target="profile-bits">
        	DNCP profile</xref> and truncated to the number of bits specified
        	therein.</t>

        	<t>Individual node data hashes are calculated by applying the
        	function and truncation on the respective node's node data as
        	published in the Node State TLV. Such node data sets are always
        	ordered as defined in <xref target="node-state" />.</t>

        	<t>The network state hash is calculated by applying the function
        	and truncation on the concatenated network state. This state is
        	formed by first concatenating each node's sequence number (in
        	network byte order) with its node data hash to form a per-node
        	datum for each node. These per-node data are then concatenated in
        	ascending order of the respective node's node identifier, i.e.,
        	in the order that the nodes appear in the hash tree.</t>

        </section>

		<section title="Updating network state and node data hashes">
        	<t>The network state hash and the node data hashes are updated
        	on-demand and whenever any locally stored per-node state changes.
        	This includes local unidirectional reachability encoded in the
        	published <xref target="peer">Peer TLVs</xref> and - when combined
        	with remote data - results in awareness of bidirectional
        	reachability changes.</t>
        </section>

      </section>

      <section anchor="dt" title="Data Transport">

        <t>DNCP has few requirements for the underlying
        transport; it requires some way of transmitting either unicast
        datagram or stream data to a peer and, if used in multicast mode, a
        way of sending multicast datagrams.

        As multicast is used only to identify potential new DNCP nodes and
        to send status messages which merely notify that a unicast exchange
        should be triggered, the multicast transport does not have to be
        secured.

        If unicast security is desired and one of the built-in security
        methods is to be used, support for some TLS-derived transport
        scheme - such as <xref target="RFC5246">TLS</xref> on top of TCP or
        <xref target="RFC6347">DTLS</xref> on top of UDP - is also
        required. They provide for integrity protection and confidentiality
        of the node data, as well as authentication and authorization using
        the schemes defined in <xref target="sec-trust">Security and Trust
        Management</xref>.

        A specific definition of the transport(s) in use and their parameters
        MUST be provided by the DNCP profile.</t>

        <t><xref target="tlvs">TLVs</xref> are sent across the transport as is,
        and they SHOULD be sent together where, e.g., MTU considerations do not
        recommend sending them in multiple batches. DNCP does not fragment or
        reassemble TLVs thus it MUST be ensured that the underlying transport
        performs these operations should they be necessary. If this document
        indicates sending one or more TLVs, then the sending node does not need
        to keep track of the packets sent after handing them over to the
        respective transport, i.e., reliable DNCP operation is ensured merely
        by the explicitly defined timers and state machines such as
        <xref target="trickle-updates">Trickle</xref>.
        TLVs in general are handled individually and statelessly (and thus do
        not need to be sent in any particular order) with one exception:
        To form bidirectional peer relationships DNCP requires
        identification of the endpoints used for communication. As bidirectional peer
        relationships are required for validating liveliness of published node
        data as described in <xref target="liveliness" />, a DNCP node MUST
        send a <xref target="endpoint">Node Endpoint TLV</xref>. When it is
        sent varies, depending on the underlying transport, but
        conceptually it should be available whenever processing a Network
        State TLV:

        <list style="symbols">

          <t>If using a stream transport, the TLV MUST be sent at least
          once per connection, but SHOULD NOT be sent more than once.</t>

          <t>If using a datagram transport, it MUST be included in every
          datagram that also contains a <xref target="net-state">Network
          State TLV</xref> and MUST be located before any such TLV.
          It SHOULD also be included in any other datagram, to speed up
          initial peer detection.</t>
        </list>
        </t>

        <t>Given the assorted transport options as well as potential
        endpoint configuration, a DNCP endpoint may be used in various
        transport modes:

        <list style="hanging">

          <t hangText="Unicast:">
            <list style="symbols">

              <t>If only reliable unicast transport is used, Trickle is
              not used at all. Whenever the locally calculated
              network state hash changes, a
              single <xref target="net-state">Network State TLV</xref> is
              sent to every unicast peer. Additionally, recently
              changed <xref target="node-state">Node State TLVs</xref> MAY
              be included.</t>

              <t>If only unreliable unicast transport is used, Trickle
              state is kept per peer and it is used to send Network State
              TLVs intermittently, as specified in <xref
              target="trickle-updates" />.</t>
            </list>
          </t>

          <t hangText="Multicast+Unicast:"> If multicast datagram transport
          is available on an endpoint, Trickle state is only maintained for
          the endpoint as a whole. It is used to send Network State TLVs
          periodically, as specified in <xref target="trickle-updates"
          />. Additionally, per-endpoint keep-alives MAY be defined in the
          DNCP profile, as specified in <xref target="pe-ka" />.</t>

          <t hangText="MulticastListen+Unicast:">
            Just like Unicast, except multicast transmissions are listened to
            in order to detect changes of the highest node identifier.
            This mode is used only if the DNCP profile supports <xref
            target="dense-multicast">dense multicast-enabled link optimization</xref>.</t>
        </list>
        </t>

      </section>

      <section title="Trickle-Driven Status Updates"
               anchor="trickle-updates">

		<t>The <xref target="RFC6206">Trickle algorithm</xref> is used to
		ensure protocol reliability over unreliable multicast or unicast
		transports. For reliable unicast transports, its actual algorithm
		is unnecessary and <xref target="dt">omitted</xref>. DNCP maintains
		multiple Trickle states as defined in <xref target="dm" />. Each such
		state can be based on different parameters (see below) and is
		responsible for ensuring that a specific peer or all peers on the
		respective endpoint are regularly provided with the node's current
		locally calculated network state hash for state comparison, i.e.,
		to detect potential divergence in the perceived network state.</t>

        <t>Trickle defines
        3 parameters: Imin, Imax and k. Imin and Imax represent the minimum
        value for I and the maximum number of doublings of Imin, where I is
        the time interval during which at least k Trickle updates must be
        seen on an endpoint to prevent local state transmission.  The
        actual suggested Trickle algorithm parameters are DNCP profile
        specific, as described in <xref target="profile-bits"/>.</t>

        <t>The Trickle state for all Trickle instances defined in <xref
        target="dm" /> is considered inconsistent and reset if and only if
        the locally calculated network state hash changes.  This occurs
        either due to a change in the local node's own node data, or due to
        receipt of more recent data from another node as explained in
        <xref target="hash-tree" />. A node MUST NOT
        reset its Trickle state merely based on receiving a <xref
        target="net-state">Network State TLV</xref> with a network state
        hash which is different from its locally calculated one.</t>

        <t>Every time a particular Trickle instance indicates that an
        update should be sent, the node MUST send a <xref
        target="net-state">Network State TLV</xref> if and only if:
        <list style="symbols">

          <t>the endpoint is in Multicast+Unicast transport mode, in which
          case the TLV MUST be sent over multicast.</t>

          <t>the endpoint is NOT in Multicast+Unicast transport mode, and the
          unicast transport is unreliable, in which case the TLV MUST be sent
          over unicast.</t>

        </list>
        </t>

        <t>A (sub)set of all <xref target="node-state">Node State
        TLVs</xref> MAY also be included, unless it is defined as
        undesirable for some reason by the DNCP profile, or to avoid
        exposure of the node state TLVs by transmitting them within
        insecure multicast when using secure unicast.</t>

      </section>

      <section title="Processing of Received TLVs" anchor="reception">

        <t>This section describes how received TLVs are processed. The DNCP
        profile may specify when to ignore particular TLVs, e.g., to modify
        security properties - see <xref target="profile-bits" /> for
        what may be safely defined to be ignored in a profile.

        Any 'reply' mentioned in the steps below denotes sending of the
        specified TLV(s) to the originator of the TLV being processed.
        All such replies MUST be sent using unicast.
        If the TLV being replied to was received via multicast
        and it was sent to a multiple access link, the reply MUST be
        delayed by a random timespan in [0, Imin/2], to avoid potential
        simultaneous replies that may cause problems on some links,
        unless specified differently in the DNCP profile. Sending
        of replies MAY also be rate-limited or omitted for a short period
        of time by an implementation. However, if the TLV is not forbidden
        by the DNCP profile, an implementation MUST reply to
        retransmissions of the TLV with a non-zero probability to avoid
        starvation which would break the state synchronization.</t>

        <t>A DNCP node MUST process TLVs received from any valid (e.g.,
        correctly scoped) address,
        as specified by the DNCP profile and the configuration of a
        particular endpoint, whether this address is known to be the
        address of a peer or not. This provision satisfies the needs of
        monitoring or other host software that needs to discover the DNCP
        topology without adding to the state in the network.</t>

        <t>Upon receipt of:
        <list style="symbols">

          <t><xref target="req-net-state">Request Network State TLV</xref>:

          The receiver MUST reply with a <xref target="net-state">Network
          State TLV</xref> and a <xref target="node-state">Node State
          TLV</xref> for each node data used to calculate the network state
          hash. The Node State TLVs SHOULD NOT contain the optional node
          data part to avoid redundant transmission of node data,
          unless explicitly specified in the DNCP profile.</t>

          <t><xref target="req-node-state">Request Node State TLV</xref>:

          If the receiver has node data for the corresponding node, it MUST
          reply with a <xref target="node-state">Node State TLV</xref> for
          the corresponding node. The optional node data part MUST be
          included in the TLV.</t>

          <t><xref target="net-state">Network State TLV</xref>:

          If the network state hash differs from the locally calculated
          network state hash, and the receiver is unaware of any particular
          node state differences with the sender, the receiver MUST reply
          with a <xref target="req-net-state">Request Network State
          TLV</xref>. These replies MUST be rate limited to only at most
          one reply per link per unique network state hash within Imin. The
          simplest way to ensure this rate limit is a timestamp indicating
          requests, and sending at most one <xref target="req-net-state">
          Request Network State TLV</xref> per Imin.

          To facilitate faster state synchronization, if a Request Network
          State TLV is sent in a reply, a local, current Network State TLV
          MAY also be sent.</t>

          <t><xref target="node-state">Node State TLV</xref>:

          <list style="symbols">

            <t>If the node identifier matches the local node identifier and
            the TLV has a greater sequence number than its current
            local value, or the same sequence number and a different
            hash, the node SHOULD re-publish its own node data with a
            sequence number significantly (e.g., 1000) greater than
            the received one, to reclaim the node identifier. This difference
            is needed in order to ensure that it is higher than any potentially
            lingering copies of the node state in the network.
            This may occur normally once due to the local
            node restarting and not storing the most recently used
            sequence number. If this occurs more than once or for nodes
            not re-publishing their own node data, the DNCP profile
            MUST provide guidance on how to handle these situations as
            it indicates the existence of another active node with the same
            node identifier.</t>

            <t>If the node identifier does not match the local node
            identifier, and one or more of the following conditions are
            true:

            <list style="symbols">

              <t>The local information is outdated for the corresponding node
              (local sequence number is less than that within the
              TLV).</t>

              <t>The local information is potentially incorrect (local
              sequence number matches but the node data hash differs).</t>

              <t>There is no data for that node altogether.</t>

            </list>

            Then:

            <list style="symbols">

              <t>If the TLV contains the Node Data field, it SHOULD also be
              verified by ensuring that the locally calculated hash of the
              Node Data matches the content of the H(Node Data) field within
              the TLV. If they differ, the TLV SHOULD be ignored and not
              processed further.</t>

              <t>If the TLV does not contain the Node Data field, and the
              H(Node Data) field within the TLV differs from the local node
              data hash for that node (or there is none), the receiver MUST
              reply with a <xref target="req-node-state">Request Node State
              TLV</xref> for the corresponding node.</t>

              <t>Otherwise the receiver MUST update its locally stored
              state for that node (node data based on Node Data field if
              present, sequence number and relative time) to match the
              received TLV.</t>
            </list>
            </t>


          </list>


          For comparison purposes of the sequence number,
          a looping comparison function MUST be used to avoid problems in
          case of overflow.

          The comparison function a < b <=> ((a - b) % (2^32))
          & (2^31) != 0 where (a % b) represents the remainder of a
          modulo b and (a & b) represents bitwise conjunction of a and
          b is RECOMMENDED unless the DNCP profile defines another.
          </t>

          <t>Any other TLV:

          TLVs not recognized by the receiver MUST be silently ignored
          unless they are sent within another TLV (for example, TLVs within
          the Node Data field of a Node State TLV). TLVs within the Node Data
          field of the Node State TLV not recognized by the receiver MUST be
          retained for distribution to other nodes and for calculating the
          node data hash as described in <xref target="node-state" /> but
          are ignored for other purposes.</t>

        </list>
        </t>

        <t>If secure unicast transport is configured for an endpoint, any
        Node State TLVs received over insecure multicast MUST be silently
        ignored.</t>

      </section>

      <section anchor="peers" title="Discovering, Adding and Removing Peers">

        <t>Peer relations are established between neighbors using one or more
        mutually connected endpoints. Such neighbors exchange information about
        network state and published data directly and through transitivity this
        information then propagates throughout the network.</t>

		<t>New peers are discovered using the regular unicast or multicast
		transport defined in the <xref target="profile-bits">DNCP profile
		</xref>. This process is not distinguished from peer addition, i.e.,
		an unknown peer is simply discovered by receiving regular DNCP protocol
		TLVs from it and dedicated discovery messages or TLVs do not exist.
		For unicast-only transports, the individual node's transport addresses
		are preconfigured or obtained using an external service discovery
		protocol. In the presence of a multicast transport, messages from
		unknown peers are handled in the same way as multicast messages from
		peers that are already known, thus new peers are simply discovered
		when sending their regular DNCP protocol TLVs using multicast.</t>

        <t>When receiving a <xref target="endpoint">Node Endpoint
        TLV</xref> on an endpoint from an unknown peer:

        <list style="symbols">

          <t>If received over unicast, the remote node MUST be added as a
          peer on the endpoint and a <xref target="peer">Peer
          TLV</xref> MUST be created for it.
          </t>

          <t>If received over multicast, the node MAY be sent a (possibly
          rate-limited) unicast <xref target="req-net-state">Request
          Network State TLV</xref>.</t>

        </list>
        </t>

        <t>If keep-alives specified in <xref target="ka" /> are NOT sent by
        the peer (either the DNCP profile does not specify the use of
        keep-alives or the particular peer chooses not to send
        keep-alives), some other existing local transport-specific
        means (such as Ethernet carrier-detection or TCP keep-alive)
        MUST be used to ensure its presence.

        If the peer does not send keep-alives, and no means to verify
        presence of the peer are available, the peer MUST be considered no
        longer present and it SHOULD NOT be added back as a peer until it
        starts sending keep-alives again.

        When the peer is no longer present, the Peer
        TLV and the local DNCP peer state MUST be removed. DNCP does not
        define an explicit message or TLV for indicating the termination of
        DNCP operation by the terminating node, however a derived protocol
        could specify an extension, if the need arises.</t>

        <t>If the local endpoint is in the Multicast-Listen+Unicast
        transport mode, a <xref target="peer">Peer TLV</xref> MUST
        NOT be published for the peers not having the highest node
        identifier.</t>

      </section>

      <section anchor="liveliness" title="Data Liveliness Validation">

        <t>Maintenance of the <xref target="hash-tree">hash tree</xref> and
        thereby network state hash updates depend on up-to-date information
        on bidirectional node reachability derived from the contents of a
        topology graph. This graph changes whenever nodes are added to or
        removed from the network or when bidirectional connectivity between
        existing nodes is established or lost. Therefore the graph MUST be
        updated either immediately or with a small delay shorter than the
        DNCP profile-defined Trickle Imin, whenever:

        <list style="symbols">
          <t>A Peer TLV or a whole node is added or removed, or</t>

          <t>the origination time (in milliseconds) of some node's node
          data is less than current time - 2^32 + 2^15.</t>

        </list>

        The artificial upper limit for the origination time is used to
        gracefully avoid overflows of the origination time and allow for
        the node to republish its data as noted in <xref
        target="node-state" />.
        </t>

        <t>The topology graph update starts with the local node marked as
        reachable and all other nodes marked as unreachable. Other nodes are
        then iteratively marked as reachable using the following algorithm:
        A candidate not-yet-reachable node N with an endpoint NE is marked
        as reachable if there is a reachable node R with an endpoint RE that
        meet all of the following criteria:

        <list style="symbols">

          <t>The origination time (in milliseconds) of R's node data is
          greater than current time - 2^32 + 2^15.</t>

          <t>R publishes a Peer TLV with:
          <list style="symbols">

            <t>Peer Node Identifier = N's node identifier</t>

            <t>Peer Endpoint Identifier = NE's endpoint
            identifier</t>

            <t>Endpoint Identifier = RE's endpoint identifier</t>

          </list>
          </t>

          <t>N publishes a Peer TLV with:
          <list style="symbols">

            <t>Peer Node Identifier = R's node identifier</t>

            <t>Peer Endpoint Identifier = RE's endpoint identifier</t>

            <t>Endpoint Identifier = NE's endpoint identifier</t>

          </list>
          </t>
        </list>

		The algorithm terminates, when no more candidate nodes
		fulfilling these criteria can be found.
        </t>

        <t>DNCP nodes that have not been reachable in the most recent
        topology graph traversal MUST NOT be used for calculation of the
        network state hash, be provided to any applications that need to
        use the whole TLV graph, or be provided to remote nodes. They MAY
        be forgotten immediately after the topology graph traversal,
        however it is RECOMMENDED to keep them at least briefly to improve
        the speed of DNCP network state convergence. This reduces the
        number of queries needed to reconverge during both initial network
        convergence and when a part of the network loses and regains
        bidirectional connectivity within that time period.</t>

      </section>

    </section>

    <section anchor="dm" title="Data Model">

      <t>This section describes the local data structures a minimal
      implementation might use. This section is provided only as a
      convenience for the implementor. Some of the <xref
      target="ext">optional extensions</xref> describe additional data
      requirements, and some optional parts of the core protocol may also
      require more.</t>

      <t>A DNCP node has:

      <list style="symbols">

        <t>A data structure containing data about the most recently sent
        <xref target="req-net-state">Request Network State TLVs</xref>.
        The simplest option is keeping a timestamp of the most recent request
        (required to fulfill reply rate limiting specified in <xref target="reception" />).</t>

      </list>
      </t>

      <t>A DNCP node has for every DNCP node in the DNCP network:

      <list style="symbols">

        <t>Node identifier: the unique identifier of the node. The length,
        how it is produced, and how collisions are handled, is up to the
        DNCP profile.</t>

        <t>Node data: the set of TLV tuples published by that particular
        node. As they are transmitted ordered (see <xref
        target="node-state">Node State TLV</xref> for details), maintaining
        the order within the data structure here may be reasonable. </t>

        <t>Latest sequence number: the 32-bit sequence number that
        is incremented any time the TLV set is published. The comparison
        function used to compare them is described in <xref
        target="reception" />.</t>

        <t>Origination time: the (estimated) time when the
        current TLV set with the current sequence number was
        published.

        It is used to populate the Milliseconds Since Origination field in
        a <xref target="node-state">Node State TLV</xref>. Ideally it also
        has millisecond accuracy.
        </t>

      </list>
      </t>

      <t>Additionally, a DNCP node has a set of endpoints for which DNCP
      is configured to be used. For each such endpoint, a node has:
      <list style="symbols">

        <t>Endpoint identifier: the 32-bit opaque locally unique
        value identifying the endpoint within a node. It SHOULD
        NOT be reused immediately after an endpoint is disabled.</t>

        <t>Trickle instance: the endpoint's Trickle instance with
        parameters I, T, and c (only on an endpoint in Multicast+Unicast
        transport mode).</t>

      </list>
      </t>

      <t>and one (or more) of the following:
      <list style="symbols">
	<t>Interface: the assigned local network interface.</t>

        <t>Unicast address: the DNCP node it should connect with.</t>

        <t>Set of addresses: the DNCP nodes from which connections
        are accepted.</t>
      </list>
      </t>

      <t>For each remote (peer, endpoint) pair detected on a
      local endpoint, a DNCP node has:

      <list style="symbols">

        <t>Node identifier: the unique identifier of the peer.</t>

        <t>Endpoint identifier: the unique endpoint identifier used by the
        peer.</t>

        <t>Peer address: the most recently used address of the peer
        (authenticated and authorized, if security is enabled).</t>

        <t>Trickle instance: the particular peer's Trickle instance with
        parameters I, T, and c (only on an endpoint in Unicast mode, when
        using an unreliable unicast transport) .</t>

      </list>
      </t>
    </section>



    <section anchor="ext" title="Optional Extensions">

      <t>This section specifies extensions to the core protocol that a DNCP
      profile may specify to be used.</t>

      <section anchor="ka" title="Keep-Alives">

        <t>While DNCP provides <xref target="peers">mechanisms for
        discovery and adding of new peers on an endpoint</xref>, as well as
        state change notifications, another mechanism may be needed to get
        rid of old, no longer valid peers if the transport or lower layers
        do not provide one as noted in <xref target="liveliness" />.</t>

        <t>If keep-alives are not specified in the DNCP profile, the rest
        of this subsection MUST be ignored.</t>

        <t>A DNCP profile MAY specify either per-endpoint (sent using
        multicast to all DNCP nodes connected to a multicast-enabled link)
        or per-peer (sent using unicast to each peer individually)
        keep-alive support. </t>

        <t>For every endpoint that a keep-alive is specified for in the
        DNCP profile, the endpoint-specific keep-alive interval MUST be
        maintained. By default, it is DNCP_KEEPALIVE_INTERVAL. If there is a
        local value that is preferred for that for any reason (configuration,
        energy conservation, media type, ..), it can be substituted
        instead. If a non-default keep-alive interval is used on any
        endpoint, a DNCP node MUST publish appropriate <xref
        target="ka-interval">Keep-Alive Interval TLV(s)</xref> within its
        node data.</t>

        <section title="Data Model Additions" anchor="ka-dm">

          <t>The following additions to the <xref target="dm">Data
          Model</xref> are needed to support keep-alives:</t>

          <t>For each configured endpoint that has per-endpoint keep-alives
          enabled:

          <list style="symbols">
            <t>Last sent: If a timestamp which indicates the last time a
            <xref target="net-state">Network State TLV</xref> was sent over
            that interface.</t>
          </list>
          </t>

          <t>For each remote (peer, endpoint) pair detected on a
          local endpoint, a DNCP node has:


          <list style="symbols">

            <t>Last contact timestamp: a timestamp which indicates the last
            time a consistent <xref target="net-state">Network State
            TLV</xref> was received from the peer over multicast, or anything
            was received over unicast. Failing to update it for a certain
            amount of time as specified in <xref target="ka-peer-removal" />
            results in the removal of the peer. When adding a new peer, it is
            initialized to the current time.</t>

            <t>Last sent: If per-peer keep-alives are enabled, a timestamp
            which indicates the last time a <xref
            target="net-state">Network State TLV</xref> was sent to to that
            point-to-point peer. When adding a new peer, it is initialized
            to the current time.</t>

          </list>
          </t>

        </section>

        <section anchor="pe-ka" title="Per-Endpoint Periodic Keep-Alives">

          <t>If per-endpoint keep-alives are enabled on an endpoint in
          Multicast+Unicast transport mode, and if no traffic containing a
          <xref target="net-state">Network State TLV</xref> has been sent
          to a particular endpoint within the endpoint-specific keep-alive
          interval, a <xref target="net-state">Network State TLV</xref>
          MUST be sent on that endpoint,

          and a new Trickle interval started, as specified in the
          step 2 of Section 4.2 of <xref target="RFC6206" />.

          The actual sending
          time SHOULD be further delayed by a random timespan in [0,
          Imin/2].</t>

        </section>

        <section title="Per-Peer Periodic Keep-Alives">

          <t>If per-peer keep-alives are enabled on a unicast-only
          endpoint, and if no traffic containing a <xref
          target="net-state">Network State TLV</xref> has been sent to a
          particular peer within the endpoint-specific keep-alive interval,
          a <xref target="net-state">Network State TLV</xref> MUST be sent to
          the peer,

          and a new Trickle interval started, as specified in the
          step 2 of Section 4.2 of <xref target="RFC6206" />.

          </t>

        </section>

        <section title="Received TLV Processing Additions">

          <t>If a TLV is received over unicast from the peer, the Last
          contact timestamp for the peer MUST be updated.</t>

          <t>On receipt of a <xref target="net-state">Network State TLV</xref>
          which is consistent with the locally calculated network state hash,
          the Last contact timestamp for the peer MUST be updated in order
          to maintain it as a peer.</t>

        </section>

        <section title="Peer Removal" anchor="ka-peer-removal">

          <t>For every peer on every endpoint, the endpoint-specific
          keep-alive interval must be calculated by looking for <xref
          target="ka-interval">Keep-Alive Interval TLVs</xref> published by
          the node, and if none exist, using the default value of
          DNCP_KEEPALIVE_INTERVAL. If the peer's Last contact
          timestamp has not been updated for at least locally chosen
          potentially endpoint-specific keep-alive multiplier (defaults to
          DNCP_KEEPALIVE_MULTIPLIER) times the peer's endpoint-specific
          keep-alive interval, the Peer TLV for that peer and the local
          DNCP peer state MUST be removed.</t>

        </section>

      </section>

      <section anchor="dense-multicast" title="Support For Dense Multicast-Enabled Links">

        <t>This optimization is needed to avoid a state space explosion.
        Given a large set of DNCP nodes publishing data on an endpoint
        that uses multicast on a link, every node will add a
        <xref target="peer">Peer TLV</xref> for each peer.
        While Trickle limits the amount of traffic on the link in
        stable state to some extent, the total amount of data that is added
        to and maintained in the DNCP network given N nodes on a
        multicast-enabled link is O(N^2). Additionally if per-peer
        keep-alives are used, there will be O(N^2) keep-alives running
        on the link if liveliness of peers is not ensured using some other
        way (e.g., TCP connection lifetime, layer 2 notification,
        per-endpoint keep-alive). </t>

        <t>An upper bound for the number of peers that are allowed for
        a particular type of link that an endpoint in Multicast+Unicast
        transport mode is used on SHOULD be provided by a DNCP profile, but
        MAY also be chosen at runtime.

        The main consideration when selecting a bound (if any)
        for a particular type of link should be whether it supports
        multicast traffic, and whether a too large number of peers case
        is likely to happen during the use of that DNCP profile
        on that particular type of link. If neither is likely, there is little
        point specifying support for this for that particular link
        type.</t>

        <t>If a DNCP profile does not support this extension at all, the
        rest of this subsection MUST be ignored. This is because when this
        extension is used, the state within the DNCP network only
        contains a subset of the full topology of the network. Therefore
        every node must be aware of the potential of it being used in a
        particular DNCP profile.</t>

        <t>If the specified upper bound is exceeded for some endpoint in
        Multicast+Unicast transport mode and if the node does not have the
        highest node identifier on the link, it SHOULD treat the endpoint
        as a unicast endpoint connected to the node that has the highest
        node identifier detected on the link, therefore transitioning to
        Multicast-listen+Unicast transport mode. See <xref target="dt" />
        for implications on the specific endpoint behavior. The nodes in
        Multicast-listen+Unicast transport mode MUST keep listening to
        multicast traffic to both receive messages from the node(s) still
        in Multicast+Unicast mode, and as well to react to nodes with a
        greater node identifier appearing. If the highest node identifier
        present on the link changes, the remote unicast address of the
        endpoints in Multicast-Listen+Unicast transport mode MUST be
        changed. If the node identifier of the local node is the highest
        one, the node MUST switch back to, or stay in Multicast+Unicast
        mode, and form peer relationships with all peers as specified
        in <xref target="peers" />.</t>

      </section>
    </section>

    <section anchor="tlvs" title="Type-Length-Value Objects">

      <figure>
        <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Type               |           Length              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               Value (if any) (+padding (if any))              |
..
|                     (variable # of bytes)                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     (Optional nested TLVs)                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

      <t>Each TLV is encoded as:

      <list style="symbols">

        <t>a 2 byte Type field</t>

        <t>a 2 byte Length field which contains the length of the Value
        field in bytes; 0 means no Value</t>

        <t>the Value itself (if any)</t>

        <t>padding bytes with value of zero up to the next 4 byte
        boundary if the Length is not divisible by 4.</t>

      </list>

      While padding bytes MUST NOT be included in the number stored in
      the Length field of the TLV, if the TLV is enclosed within
      another TLV, then the padding is included in the enclosing TLV's
      Length value.</t>

      <t>Each TLV which does not define optional fields or variable-length
      content MAY be sent with additional sub-TLVs appended after the TLV
      to allow for extensibility.

      When handling such TLV types, each node MUST accept received TLVs
      that are longer than the fixed fields specified for the particular
      type, and ignore the sub-TLVs with either unknown types, or not
      supported within that particular TLV type.

      If any sub-TLVs are present, the Length field of the TLV describes
      the number of bytes from the first byte of the TLV's own Value (if
      any) to the last (padding) byte of the last sub-TLV.</t>

      <t>
        For example, type=123 (0x7b) TLV with value 'x' (120 =
        0x78) is encoded as: 007B 0001 7800 0000. If it were to have
        sub-TLV of type=124 (0x7c) with value 'y', it would be encoded as
        007B 000C 7800 0000 007C 0001 7900 0000.
      </t>

      <t>In this section, the following special notation is used:
      <list>

        <t>.. = octet string concatenation operation.</t>

        <t>H(x) = non-cryptographic hash function specified by DNCP
        profile. </t>

      </list>
      </t>


      <section title="Request TLVs" anchor="tlv-request">

        <section anchor="req-net-state" title="Request Network State TLV">

          <figure>
            <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Type: REQ-NETWORK-STATE (1)  |          Length: >= 0         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            </artwork>
          </figure>

          <t>This TLV is used to request response with a <xref
          target="net-state">Network State TLV</xref> and all <xref
          target="node-state">Node State TLVs</xref> (without node
          data).</t>

        </section>
        <section anchor="req-node-state" title="Request Node State TLV">


          <figure>
            <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Type: REQ-NODE-STATE (2)   |          Length: > 0          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Node Identifier                        |
|                  (length fixed in DNCP profile)               |
...
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            </artwork>
          </figure>

          <t>This TLV is used to request a <xref target="node-state">
          Node State TLV</xref> (including node data) for the node
          with the matching node identifier.</t>

        </section>

      </section>
      <section title="Data TLVs" anchor="tlv-data">
        <section anchor="endpoint" title="Node Endpoint TLV">

          <figure>
            <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   Type: NODE-ENDPOINT (3)     |          Length: > 4          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Node Identifier                        |
|                  (length fixed in DNCP profile)               |
...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      Endpoint Identifier                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            </artwork>
          </figure>

          <t>This TLV identifies both the local node's node identifier, as
          well as the particular endpoint's endpoint identifier.
          <xref target="dt" /> specifies when it is sent.</t>

        </section>
        <section anchor="net-state" title="Network State TLV">

          <figure>
            <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Type: NETWORK-STATE (4)    |          Length: > 0          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     H(sequence number of node 1 .. H(node data of node 1) ..  |
|    .. sequence number of node N .. H(node data of node N))    |
|                  (length fixed in DNCP profile)               |
...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            </artwork>
          </figure>

          <t>This TLV contains the current network state hash calculated by its
          sender (<xref target="hash-tree" /> describes the algorithm).</t>

        </section>
        <section anchor="node-state" title="Node State TLV">

          <figure>
            <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      Type: NODE-STATE (5)     |          Length: > 8          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Node Identifier                        |
|                  (length fixed in DNCP profile)               |
...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Sequence Number                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                Milliseconds Since Origination                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         H(Node Data)                          |
|                  (length fixed in DNCP profile)               |
...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       (optionally) Node Data (a set of nested TLVs)           |
...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            </artwork>
          </figure>

          <t>This TLV represents the local node's knowledge about the
          published state of a node in the DNCP network identified by the
          Node Identifier field in the TLV. </t>

          <t>Every node, including the node publishing the node data, MUST
          update the Milliseconds Since Origination whenever it sends a
          Node State TLV based on when the node estimates the data was
          originally published. This is, e.g., to ensure that any relative
          timestamps contained within the published node data can be
          correctly offset and interpreted. Ultimately, what is provided is
          just an approximation, as transmission delays are not accounted
          for. </t>

          <t>Absent any changes, if the originating node notices that the
          32-bit milliseconds since origination value would be close to
          overflow (greater than 2^32-2^16), the node MUST re-publish its
          TLVs even if there is no change. In other words, absent any other
          changes, the TLV set MUST be re-published roughly every 48
          days.</t>

          <t>The actual node data of the node may be included within the
          TLV as well in the optional Node Data field.

          The set of TLVs MUST be strictly ordered based on ascending
          binary content (including TLV type and length). This enables,
          e.g., efficient state delta processing and no-copy indexing by
          TLV type by the recipient.

          The Node Data content MUST be passed along exactly as it was
          received. It SHOULD be also verified on receipt that the locally
          calculated H(Node Data) matches the content of the field within
          the TLV, and if the hash differs, the TLV SHOULD be ignored.</t>

          <!-- SB: this paragraph is essentially duplicate with L 570ff, so
          normative language should match. or maybe change this to xref? -->

        </section>

      </section>


      <section title="Data TLVs within Node State TLV" anchor="tlv-state">

        <t>These TLVs are published by the DNCP nodes, and therefore only
        encoded in the Node Data field of Node State TLVs. If encountered
        outside Node State TLV, they MUST be silently ignored.</t>

        <section anchor="peer"
                 title="Peer TLV">
          <figure>
            <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Type: PEER (8)          |          Length: > 8          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      Peer Node Identifier                     |
|                  (length fixed in DNCP profile)               |
...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Peer Endpoint Identifier                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   (Local) Endpoint Identifier                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            </artwork>
          </figure>

          <t>This TLV indicates that the node in question vouches that the
          specified peer is reachable by it on the specified local
          endpoint.

          The presence of this TLV at least guarantees that the node
          publishing it has received traffic from the peer
          recently. For guaranteed up-to-date bidirectional reachability,
          the existence of both nodes' matching Peer TLVs needs to be
          checked. </t>
        </section>

        <section anchor="ka-interval"
                 title="Keep-Alive Interval TLV">

          <figure>
            <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type: KEEP-ALIVE-INTERVAL (9) |          Length: >= 8         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      Endpoint Identifier                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Interval                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            </artwork>
          </figure>

          <t>This TLV indicates a non-default interval being used to send
          keep-alives specified in <xref target="ka" />.</t>

          <t>Endpoint identifier is used to identify the particular (local)
          endpoint for which the interval applies on the sending node.
          If 0, it applies for ALL endpoints for which no specific TLV
          exists.</t>

          <t>Interval specifies the interval in milliseconds at which the
          node sends keep-alives. A value of zero means no keep-alives are
          sent at all; in that case, some lower layer mechanism that
          ensures presence of nodes MUST be available and used. </t>
        </section>

      </section>

    </section>


    <section anchor="sec-trust" title="Security and Trust Management">

      <t>If specified in the DNCP profile, either <xref
      target="RFC6347">DTLS</xref> or <xref target="RFC5246">TLS</xref> may
      be used to authenticate and encrypt either some (if specified
      optional in the profile), or all unicast traffic. The following
      methods for establishing trust are defined, but it is up to the DNCP
      profile to specify which ones may, should or must be supported.</t>

      <section title="Pre-Shared Key Based Trust Method">

        <t>A PSK-based trust model is a simple security management
        mechanism that allows an administrator to deploy devices to an
        existing network by configuring them with a pre-defined key,
        similar to the configuration of an administrator password or
        WPA-key.  Although limited in nature it is useful to provide a
        user-friendly security mechanism for smaller networks. </t>

      </section>

      <section title="PKI Based Trust Method">

        <t>A PKI-based trust-model enables more advanced management
        capabilities at the cost of increased complexity and
        bootstrapping effort. It however allows trust to be managed in a
        centralized manner and is therefore useful for larger networks
        with a need for an authoritative trust management.</t>

      </section>

      <section title="Certificate Based Trust Consensus Method">

        <t>For some scenarios - such as bootstrapping a mostly unmanaged
        network - the methods described above may not provide a desirable
        tradeoff between security and user experience. This section
        includes guidance for implementing an <xref target="RFC7435">
        opportunistic security</xref> method which DNCP profiles can build
        upon and adapt for their specific requirements.</t>

        <t>The certificate-based consensus model is designed to be a
        compromise between trust management effort and flexibility. It is
        based on X.509-certificates and allows each DNCP node to provide a
        trust verdict on any other certificate and a consensus is found to
        determine whether a node using this certificate or any
        certificate signed by it is to be trusted. </t>

        <t>A DNCP node not using this security method MUST ignore all
        announced trust verdicts and MUST NOT announce any such verdicts
        by itself, i.e., any other normative language in this subsection
        does not apply to it.</t>

        <t>The current effective trust verdict for any certificate is
        defined as the one with the highest priority from all trust
        verdicts announced for said certificate at the time.</t>

        <section title="Trust Verdicts">

          <t>Trust verdicts are statements of DNCP nodes about the
          trustworthiness of X.509-certificates.  There are 5 possible
          trust verdicts in order of ascending priority:

          <list>

            <t>0 (Neutral): no trust verdict exists but the DNCP network
            should determine one.</t>

            <t>1 (Cached Trust): the last known effective trust verdict was
            Configured or Cached Trust.</t>

            <t>2 (Cached Distrust): the last known effective trust verdict
            was Configured or Cached Distrust.</t>

            <t>3 (Configured Trust): trustworthy based upon an external
            ceremony or configuration.</t>

            <t>4 (Configured Distrust): not trustworthy based upon an
            external ceremony or configuration.</t>

          </list>
          </t>

          <t>
            Trust verdicts are differentiated in 3 groups:

            <list style="symbols">
              <t>Configured verdicts are used to announce explicit
              trust verdicts a node has based on any external trust
              bootstrap or predefined relation a node has formed with a
              given certificate.</t>

              <t>Cached verdicts are used to retain the last known trust
              state in case all nodes with configured verdicts about a
              given certificate have been disconnected or turned off.</t>

              <t>The Neutral verdict is used to announce a new node
              intending to join the network so a final verdict for it can
              be found.</t>
            </list>
          </t>

          <t>
            The current effective trust verdict for any certificate is
            defined as the one with the highest priority within the set of
            trust verdicts announced for the certificate in the DNCP
            network.

            A node MUST be trusted for participating in the DNCP network if
            and only if the current effective trust verdict for its own
            certificate or any one in its certificate hierarchy is (Cached
            or Configured) Trust and none of the certificates in its
            hierarchy have an effective trust verdict of (Cached or
            Configured) Distrust.

            In case a node has a configured verdict, which is different
            from the current effective trust verdict for a certificate, the
            current effective trust verdict takes precedence in deciding
            trustworthiness. Despite that, the node still retains and
            announces its configured verdict.
          </t>
        </section>

        <section title="Trust Cache">

          <t>Each node SHOULD maintain a trust cache containing the current
          effective trust verdicts for all certificates currently announced
          in the DNCP network. This cache is used as a backup of the last
          known state in case there is no node announcing a configured
          verdict for a known certificate.  It SHOULD be saved to a
          non-volatile memory at reasonable time intervals to survive a
          reboot or power outage.</t>

          <t>Every time a node (re)joins the network or detects the change
          of an effective trust verdict for any certificate, it will
          synchronize its cache, i.e., store new effective trust verdicts
          overwriting any previously cached verdicts. Configured verdicts
          are stored in the cache as their respective cached counterparts.
          Neutral verdicts are never stored and do not override existing
          cached verdicts.</t>
        </section>

        <section title="Announcement of Verdicts">

          <t>A node SHOULD always announce any configured trust verdicts it
          has established by itself, and it MUST do so if announcing the
          configured trust verdict leads to a change in the current
          effective trust verdict for the respective certificate.  In
          absence of configured verdicts, it MUST announce cached trust
          verdicts it has stored in its trust cache, if one of the
          following conditions applies:

          <list style="symbols">

            <t>The stored trust verdict is Cached Trust and the current
            effective trust verdict for the certificate is Neutral or does
            not exist.</t>

            <t>The stored trust verdict is Cached Distrust and the current
            effective trust verdict for the certificate is Cached
            Trust.</t>

          </list>

          A node rechecks these conditions whenever it detects changes of
          announced trust verdicts anywhere in the network.
          </t>

          <t>Upon encountering a node with a hierarchy of certificates for
          which there is no effective trust verdict, a node adds a Neutral
          Trust-Verdict-TLV to its node data for all certificates found in
          the hierarchy, and publishes it until an effective trust verdict
          different from Neutral can be found for any of the certificates,
          or a reasonable amount of time (10 minutes is suggested) with no
          reaction and no further authentication attempts has passed.  Such
          trust verdicts SHOULD also be limited in rate and number to
          prevent denial-of-service attacks.</t>

          <t>Trust verdicts are announced using Trust-Verdict TLVs:
          <figure>
            <artwork>
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   Type: Trust-Verdict (10)    |        Length: > 36           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Verdict    |                 (reserved)                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                                                               |
|                                                               |
|                      SHA-256 Fingerprint                      |
|                                                               |
|                                                               |
|                                                               |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Common Name                          |
            </artwork>
          </figure>

          <list>
            <t>Verdict represents the numerical index of the trust
            verdict.</t>

            <t>(reserved) is reserved for future additions and MUST be set
            to 0 when creating TLVs and ignored when parsing them.</t>

            <t>SHA-256 Fingerprint contains the <xref
            target="RFC6234">SHA-256</xref> hash value of the certificate
            in DER-format.</t>

            <t>Common Name contains the variable-length (1-64 bytes) common
            name of the certificate.</t>
          </list>
          </t>
        </section>

        <section title="Bootstrap Ceremonies">
          <t>The following non-exhaustive list of methods describes
          possible ways to establish trust relationships between
          DNCP nodes and node certificates. Trust establishment is a
          two-way process in which the existing network must trust the
          newly added node and the newly added node must trust at least
          one of its peer nodes.

          It is therefore necessary that both the newly added node and an
          already trusted node perform such a ceremony to successfully
          introduce a node into the DNCP network.  In all cases an
          administrator MUST be provided with external means to identify
          the node belonging to a certificate based on its fingerprint
          and a meaningful common name.</t>

          <section title="Trust by Identification">
            <t>A node implementing certificate-based trust MUST provide
            an interface to retrieve the current set of effective trust
            verdicts, fingerprints and names of all certificates currently
            known and set configured trust verdicts to be
            announced. Alternatively it MAY provide a companion DNCP node
            or application with these capabilities with which it has a
            pre-established trust relationship.</t>
          </section>

          <section title="Preconfigured Trust">
            <t>A node MAY be preconfigured to trust a certain set of
            node or CA certificates.  However such trust relationships
            MUST NOT result in unwanted or unrelated trust for nodes not
            intended to be run inside the same network (e.g., all other
            devices by the same manufacturer).</t>
          </section>

          <section title="Trust on Button Press">
            <t>A node MAY provide a physical or virtual interface to put
            one or more of its internal network interfaces temporarily into
            a mode in which it trusts the certificate of the first
            DNCP node it can successfully establish a connection
            with.</t>
          </section>

          <section title="Trust on First Use">
            <t>A node which is not associated with any other DNCP node MAY
            trust the certificate of the first DNCP node it can
            successfully establish a connection with. This method MUST NOT
            be used when the node has already associated with any other
            DNCP node.</t>
          </section>
        </section>
      </section>
    </section>

    <section anchor="profile-bits" title="DNCP Profile-Specific Definitions">
      <t>Each DNCP profile MUST specify the following aspects:
      <list style="symbols">

        <t>Unicast and optionally multicast transport protocol(s) to be
        used. If multicast-based node and status discovery is desired, a
        datagram-based transport supporting multicast has to be available.
        </t>

        <t>How the chosen transport(s) are secured: Not at all, optionally
        or always with the TLS scheme defined here using one or more of the
        methods, or with something else. If the links with DNCP nodes can
        be sufficiently secured or isolated, it is possible to run DNCP in
        a secure manner without using any form of authentication or
        encryption.</t>

        <t>Transport protocols' parameters such as port numbers to be used,
        or multicast address to be used. Unicast, multicast, and secure
        unicast may each require different parameters, if applicable. </t>

        <t>When receiving TLVs, what sort of TLVs are ignored in addition -
        as specified in <xref target="reception" /> - e.g., for security
        reasons. While the security of the node data published within the
        Node State TLVs is already ensured by the base specification (if
        secure mode is enabled, Node State TLVs are sent only via unicast
        as multicast ones are ignored on receipt), if a profile adds TLVs
        that are sent outside the node data, a profile should indicate
        whether or not those TLVs should be ignored if they are received
        via multicast or non-secured unicast.

        A DNCP profile may define the following DNCP TLVs to be safely
        ignored:

        <list style="symbols">

          <t>Anything received over multicast, except <xref
          target="endpoint">Node Endpoint TLV</xref> and <xref
          target="net-state">Network State TLV</xref>.
          </t>

          <t>Any TLVs received over unreliable unicast or multicast at too
          high rate; Trickle will ensure eventual convergence given the
          rate slows down at some point.</t>

        </list>
        </t>

        <t>How to deal with node identifier collision as described in <xref
        target="reception" />. Main options are either for one or both
        nodes to assign new node identifiers to themselves, or to notify
        someone about a fatal error condition in the DNCP network.</t>

        <t>Imin, Imax and k ranges to be suggested for implementations to
        be used in the Trickle algorithm. The Trickle algorithm does not
        require these to be the same across all implementations for it to
        work, but similar orders of magnitude helps implementations of a DNCP
        profile to behave more consistently and to facilitate estimation of
        lower and upper bounds for convergence behavior of the network.</t>

        <t>Hash function H(x) to be used, and how many bits of the output
        are actually used. The chosen hash function is used to handle both
        hashing of node data, and to produce network state hash, which is a
        hash of node data hashes. SHA-256 defined in <xref
        target="RFC6234" /> is the recommended default choice, but a
        non-cryptographic hash function could be used as well.

        If there is a hash collision in the network state hash, the network
        will effectively be partitioned to partitions that believe that
        they are up to date, but actually no longer converged. The network
        will converge either when some node data anywhere in the network
        changes, or when conflicting Node State TLVs get transmitted across
        the partition (either caused by <xref
        target="trickle-updates">Trickle-Driven Status Updates</xref> or as
        part of the <xref target="reception">Processing of Received
        TLVs</xref>).

        If a node publishes node data with a hash that collides with any
        previously published node data, the update may not be (fully)
        propagated and the old version of node data may be used
        instead.</t>

        <t>DNCP_NODE_IDENTIFIER_LENGTH: The fixed length of a node
        identifier (in bytes).</t>

        <t>Whether to send keep-alives, and if so, whether per-endpoint
        (requires multicast transport), or per-peer. Keep-alive has also
        associated parameters:

        <list style="symbols">
          <t>DNCP_KEEPALIVE_INTERVAL: How often keep-alives are to be
          sent by default (if enabled).</t>

          <t>DNCP_KEEPALIVE_MULTIPLIER: How many times the
          DNCP_KEEPALIVE_INTERVAL (or peer-supplied keep-alive interval
          value) a node may not be heard from to be considered still
          valid. This is just a default used in absence of any other
          configuration information, or particular per-endpoint
          configuration.</t>
        </list>
        </t>

        <t>Whether to support <xref target="dense-multicast">dense
        multicast-enabled link optimization</xref> or not.</t>

      </list>
      </t>

      <t>For some guidance on choosing transport and security options,
      please see <xref target="profile-guidance" />.</t>
    </section>

    <section title="Security Considerations">

      <t>DNCP-based protocols may use multicast to indicate DNCP state
      changes and for keep-alive purposes. However, no actual published
      data TLVs will be sent across that channel. Therefore an attacker may
      only learn hash values of the state within DNCP and may be able to
      trigger unicast synchronization attempts between nodes on a local
      link this way. A DNCP node MUST therefore rate-limit its reactions
      to multicast packets.</t>

      <t>When using DNCP to bootstrap a network, PKI based solutions may have
      issues when validating certificates due to potentially unavailable
      accurate time, or due to inability to use the network to either check
      Certificate Revocation Lists or perform on-line validation.</t>

      <t>The Certificate-based trust consensus mechanism defined in this
      document allows for a consenting revocation, however in case of a
      compromised device the trust cache may be poisoned before the actual
      revocation happens allowing the distrusted device to rejoin the network
      using a different identity.  Stopping such an attack might require
      physical intervention and flushing of the trust caches. </t>

    </section>

    <section anchor="iana" title="IANA Considerations">

      <t>IANA should set up a registry for the (decimal 16-bit) "DNCP TLV
      Types" under "Distributed Node Consensus Protocol (DNCP)", with the
      following initial contents:
      ([RFC Editor: please remove] ideally as http://www.iana.org/assignments/dncp-registry)

      <list>

      <t>0: Reserved</t>
      <t>1: Request network state</t>
      <t>2: Request node state</t>
      <t>3: Node endpoint</t>
      <t>4: Network state</t>
      <t>5: Node state</t>
      <t>6: Reserved (was: Custom)</t>
      <t>7: Reserved (was: Fragment count)</t>
      <t>8: Peer</t>
      <t>9: Keep-alive interval</t>
      <t>10: Trust-Verdict</t>
      <t>11-31: Free - policy of <xref target="RFC5226">standards action</xref> should be used</t>
      <t>32-511: Reserved for per-DNCP profile use</t>
      <t>512-767: Free - policy of <xref target="RFC5226">standards action</xref> should be used</t>
      <t>768-1023: <xref target="RFC5226">Private use</xref></t>
      <t>1024-65535: Reserved for future protocol evolution (for example,
      DNCP version 2)</t>

      </list>
      </t>

    </section>

  </middle>
  <back>
    <references title="Normative references">
      <?rfc include="reference.RFC.2119.xml"?>
      <?rfc include="reference.RFC.6206.xml"?>
      <?rfc include="reference.RFC.6234.xml"?>
      <?rfc include="reference.RFC.5226.xml"?>
    </references>
    <references title="Informative references">
      <?rfc include="reference.RFC.3493.xml"?>
      <?rfc include="reference.RFC.3315.xml"?>
      <?rfc include="reference.RFC.6347.xml"?>
      <?rfc include="reference.RFC.5246.xml"?>
      <?rfc include="reference.RFC.7435.xml"?>
      <?rfc include="reference.I-D.draft-ietf-homenet-prefix-assignment-08"?>
    </references>

    <section title="Alternative Modes of Operation">

      <t>Beyond what is described in the main text, the protocol allows for
      other uses. These are provided as examples.</t>

      <section title="Read-only Operation">

        <t>If a node uses just a single endpoint and does not need to
        publish any TLVs, full DNCP node functionality is not
        required. Such limited node can acquire and maintain view of the
        TLV space by implementing the processing logic as specified in
        <xref target="reception" />. Such node would not need Trickle,
        peer-maintenance or even keep-alives at all, as the DNCP nodes' use
        of it would guarantee eventual receipt of network state hashes, and
        synchronization of node data, even in presence of unreliable
        transport.</t>

      </section>

      <section title="Forwarding Operation">

        <t>If a node with a pair of endpoints does not need to publish any
        TLVs, it can detect (for example) nodes with the highest node
        identifier on each of the endpoints (if any). Any TLVs received from
        one of them would be forwarded verbatim as unicast to the other node
        with highest node identifier.</t>

        <t>Any tinkering with the TLVs would remove guarantees of this
        scheme working; however passive monitoring would obviously be fine.
        This type of simple forwarding cannot be chained, as it does not send
        anything proactively.</t>

      </section>

    </section>

    <section anchor="profile-guidance" title="DNCP Profile Additional Guidance">

      <t>This appendix explains implications of design choices made
      when specifying DNCP profile to use particular transport or security
      options.</t>

      <section title="Unicast Transport - UDP or TCP?">

        <t>The node data published by a DNCP node is limited to 64KB due to
        the 16-bit size of the length field of the TLV it is published
        within. Some transport choices may decrease this limit; if using
        e.g. UDP datagrams for unicast transport the upper bound of node
        data size is whatever the nodes and the underlying network can pass
        to each other as DNCP does not define its own fragmentation scheme.

        A profile which chooses UDP has to be limited to small node data
        (e.g. somewhat smaller than IPv6 default MTU if using IPv6), or
        specify a minimum which all nodes have to support. Even then, if
        using non-link-local communications, there is some concern about
        what middleboxes do to fragmented packets. Therefore, the
        use of stream transport such as TCP is probably a good idea if
        either non-link-local communication is desired, or fragmentation is
        expected to cause problems.</t>

        <t>TCP also provides some other facilities, such as a relatively
        long built-in keep-alive which in conjunction with connection
        closes occurring from eventual failed retransmissions may be
        sufficient to avoid the use of in-protocol keep-alive defined in
        <xref target="ka" />. Additionally it is reliable, so there is no
        need for Trickle on such unicast connections.</t>

        <t>The major downside of using TCP instead of UDP with DNCP-based
        profiles lies in the loss of control over the time at which TLVs
        are received; while unreliable UDP datagrams also have some delay,
        TLVs within reliable stream transport may be delayed significantly
        due to retransmissions. This is not a problem if no relative time
        dependent information is stored within the TLVs in the DNCP-based
        protocol; for such a protocol, TCP is a reasonable choice for
        unicast transport if it is available.</t>

      </section>
      <section title="(Optional) Multicast Transport">

        <t>Multicast is needed for dynamic peer discovery and to trigger
        unicast exchanges; for that, unreliable datagram transport
        (=typically UDP) is the only transport option defined within this
        specification, although DNCP-based protocols may themselves define
        some other transport or peer discovery mechanism (e.g. based on
        mDNS or DNS). </t>

        <t>If multicast is used, a well-known address should be specified,
        and for e.g. IPv6 respectively the desired address scopes. In most
        cases link-local and possibly site-local are useful scopes.</t>

      </section>

      <section title="(Optional) Transport Security">

        <t>In terms of provided security, DTLS and TLS are equivalent; they
        also consume similar amount of state on the devices. While TLS is
        on top of a stream protocol, using DTLS also requires
        relatively long session caching within the DTLS layer to avoid
        expensive re-authentication/authorization steps if and when any
        state within the DNCP network changes or per-peer keep-alive (if
        enabled) is sent.</t>

        <t>TLS implementations (at the time of the writing of the
        specification) seem more mature and available (as open source) than
        DTLS ones. This may be due to a long history of use with HTTPS.</t>

        <t>Some libraries seem not to support multiplexing between insecure
        and secure communication on the same port, so specifying distinct
        ports for secured and unsecured communication may be beneficial.</t>

      </section>
    </section>
    <section anchor="profile-example" title="Example Profile">

      <t>This is the DNCP profile of SHSP, an experimental (and for the
      purposes of this document fictional) home automation protocol. The
      protocol itself is used to make key-value store published by each
      of the nodes available to all other nodes for distributed monitoring
      and control of a home infrastructure. It defines only one additional
      TLV type: a key=value TLV which contains a single key=value
      assignment for publication.

      <list style="symbols">

        <t>Unicast transport: IPv6 TCP on port EXAMPLE-P1 since only absolute
        timestamps are used within the key=value data and since it focuses
        primarily on Linux-based nodes which support both protocols well.
        Connections from and to non-link-local addresses are ignored to
        avoid exposing this protocol outside the secure links.</t>

        <t>Multicast transport: IPv6 UDP on port EXAMPLE-P2 to link-local
        scoped multicast address ff02:EXAMPLE. At least one node per link
        in the home is assumed to facilitate node discovery without
        depending on any other infrastructure.</t>

        <t>Security: None. It is to be used only on trusted links (WPA2-x
        wireless, physically secure wired links).</t>

        <t>Additional TLVs to be ignored: None. No DNCP security is specified,
        and no new TLVs are defined outside of node data.</t>

        <t>Node identifier length (DNCP_NODE_IDENTIFIER_LENGTH): 32 bits
        that are randomly generated.</t>

        <t>Node identifier collision handling: Pick new random node
        identifier.</t>

        <t>Trickle parameters: Imin = 200ms, Imax = 7, k = 1. It means at
        least one multicast per link in 25 seconds in stable state (0.2 *
        2^7).</t>

        <t>Hash function H(x) + length: SHA-256, only 128 bits
        used. Relatively fast, and 128 bits should be plenty to prevent
        random conflicts (64 bits would most likely be sufficient, too).</t>

        <t>No <xref target="ka">in-protocol keep-alives</xref>; TCP
        keep-alive is to be used. In practice TCP keep-alive is seldom
        encountered anyway as changes in network state cause packets to be
        sent on the unicast connections, and those that fail sufficiently
        many retransmissions are dropped much before keep-alive actually
        would fire. </t>

        <t>No support for <xref target="dense-multicast">dense
        multicast-enabled link optimization</xref>; SHSP is a simple
        protocol for few nodes (network-wide, not even to mention on a
        single link), and therefore would not provide any benefit.</t>
      </list>
      </t>
    </section>

    <section title="Some Questions and Answers [RFC Editor: please remove]">

      <t>Q: 32-bit endpoint id?</t>
      <t>A: Here, it would save 32 bits per peer if it was 16 bits (and
      less is not realistic). However, TLVs defined elsewhere would not
      seem to even gain that much on average.  32 bits is also used for
      ifindex in various operating systems, making for simpler
      implementation.</t>

      <t>Q: Why have topology information at all?</t>
      <t>A: It is an alternative to the more traditional seq#/TTL-based flooding
      schemes. In steady state, there is no need to, e.g., re-publish every now
      and then.</t>

    </section>
    <section title="Changelog [RFC Editor: please remove]">
      <t>draft-ietf-homenet-dncp-10:
      <list style="symbols">
        <t>Added profile guidance section, as well as example profile.</t>
      </list>
      </t>
      <t>draft-ietf-homenet-dncp-09:
      <list style="symbols">
        <t>Reserved 1024+ TLV types for future versions (=versioning
        mechanism); private use section moved from 192-255 to 512-767.</t>
        <t>Added applicability statement and clarified some text based on
        reviews.</t>
      </list>
      </t>
      <t>draft-ietf-homenet-dncp-08:
      <list style="symbols">
        <t>Removed fragmentation as it is somewhat underspecified and
        unimplemented. It may be specified in some future extension draft
        or new version of DNCP.</t>
        <t>Added generic sub-TLV extensibility mechanism.</t>
      </list>
      </t>
      <t>draft-ietf-homenet-dncp-06:
      <list style="symbols">

        <t>Removed custom TLV.</t>

        <t>Made keep-alive multipliers local implementation choice, profiles
        just provide guidance on sane default value.</t>

        <t>Removed the DNCP_GRACE_INTERVAL as it is really
        implementation choice.</t>

        <t>Simplified the suggested structures in data model.</t>

        <t>Reorganized the document and provided an overview section.</t>

      </list>
      </t>
      <t>draft-ietf-homenet-dncp-04:
      <list style="symbols">

        <t>Added mandatory rate limiting for network state requests, and
        optional slightly faster convergence mechanism by including current
        local network state in the remote network state requests.</t>

      </list>
      </t>

      <t>draft-ietf-homenet-dncp-03:
      <list style="symbols">

        <t>Renamed connection -> endpoint.</t>

        <t>!!! Backwards incompatible change: Renumbered TLVs, and got rid
        of node data TLV; instead, node data TLV's contents are optionally
        within node state TLV.</t>

      </list>
      </t>

      <t>draft-ietf-homenet-dncp-02:
      <list style="symbols">

        <t>Changed DNCP "messages" into series of TLV streams, allowing
        optimized round-trip saving synchronization.</t>

        <t>Added fragmentation support for bigger node data and for chunking
        in absence of reliable L2 and L3 fragmentation.</t>
      </list>
      </t>

      <t>draft-ietf-homenet-dncp-01:
      <list style="symbols">

        <t>Fixed keep-alive semantics to consider unicast requests also
        updates of most recently consistent, and added proactive unicast
        request to ensure even inconsistent keep-alive messages eventually
        triggering consistency timestamp update.</t>

        <t>Facilitated (simple) read-only clients by making Node Connection
        TLV optional if just using DNCP for read-only purposes.</t>

        <t>Added text describing how to deal with "dense" networks, but left
        actual numbers and mechanics up to DNCP profiles and (local)
        configurations.</t>
      </list>
      </t>

      <t>draft-ietf-homenet-dncp-00: Split from pre-version of
      draft-ietf-homenet-hncp-03 generic parts. Changes that affect
      implementations:
      <list style="symbols">

        <t>TLVs were renumbered.</t>

        <t>TLV length does not include header (=-4). This facilitates,
        e.g., use of DHCPv6 option parsing libraries (same encoding), and
        reduces complexity (no need to handle error values of length less
        than 4).</t>

        <t>Trickle is reset only when locally calculated network state hash
        is changes, not as remote different network state hash is seen. This
        prevents, e.g., attacks by multicast with one multicast packet to force
        Trickle reset on every interface of every node on a link.</t>

        <t>Instead of 'ping', use 'keep-alive' (optional) for dead peer
        detection. Different message used!</t>

      </list>
      </t>

    </section>

    <section title="Draft Source [RFC Editor: please remove]">
      <t>As usual, this draft is available at <eref
      target="https://github.com/fingon/ietf-drafts/">
      https://github.com/fingon/ietf-drafts/</eref>
      in source format (with nice Makefile too). Feel free to send comments
      and/or pull requests if and when you have changes to it! </t>
    </section>

    <section title="Acknowledgements">

      <t>Thanks to Ole Troan, Pierre Pfister, Mark Baugher, Mark Townsley,
      Juliusz Chroboczek, Jiazi Yi, Mikael Abrahamsson, Brian Carpenter,
      Thomas Clausen, DENG Hui and Margaret Cullen for their contributions
      to the draft.</t>

      <t>Thanks to Kaiwen Jin and Xavier Bonnetain for their related
      research work.</t>

    </section>

  </back>
</rfc>

PAFTECH AB 2003-20262026-04-22 22:31:24