http://stupid.domain.name/ietf/

One document matched: draft-perkins-rtcweb-rtp-usage-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-perkins-rtcweb-rtp-usage-02"
     ipr="trust200902">
  <front>
    <title abbrev="RTP for RTC-Web">RTP Requirements for RTC-Web</title>

    <author fullname="Colin Perkins" initials="C. S." surname="Perkins">
      <organization>University of Glasgow</organization>

      <address>
        <postal>
          <street>School of Computing Science</street>

          <city>Glasgow</city>

          <code>G12 8QQ</code>

          <country>United Kingdom</country>
        </postal>

        <email>csp@csperkins.org</email>
      </address>
    </author>

    <author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 82 87</phone>

        <email>magnus.westerlund@ericsson.com</email>
      </address>
    </author>

    <author fullname="Joerg Ott" initials="J." surname="Ott">
      <organization>Aalto University</organization>

      <address>
        <postal>
          <street>School of Electrical Engineering</street>

          <city>Espoo</city>

          <code>02150</code>

          <country>Finland</country>
        </postal>

        <email>jorg.ott@aalto.fi</email>
      </address>
    </author>

    <date day="11" month="July" year="2011" />

    <abstract>
      <t>This memo discusses use of RTP in the context of the RTC-Web
      activity. It discusses important features of RTP that need to be
      considered by other parts of the RTC-Web framework, describes which RTP
      profile to use in this environment, and outlines what RTP extensions
      should be supported.</t>

      <t>This document is a candidate to become a work item of the RTCWEB
      working group as <WORKING GROUP DRAFT "MEDIA TRANSPORTS">.</t>
    </abstract>
  </front>

  <middle>
    <!--Possible todos: Number each implementation requirement so that they can be directly referenced.-->

    <section title="Introduction">
      <t>This memo discusses the <xref target="RFC3550">Real-time Transport
      Protocol (RTP)</xref> in the context of the RTC-Web activity. The work
      in the IETF Audio/Video Transport Working Group, and it's successors,
      has been about providing building blocks for real-time multimedia
      transport, and has not specified who should use which building blocks.
      The selection of building blocks and functionalities can really only be
      done in the context of some application, for example RTC-Web. We have
      selected a set of RTP features and extensions that are suitable for a
      number of applications that fit the RTC-Web context. Thus, applications
      such as VoIP, audio and video conferencing, and on-demand multimedia
      streaming are considered. Applications that rely on IP multicast have
      not been considered likely to be applicable to RTC-Web, thus extensions
      related to multicast have been excluded. We believe that RTC-Web will
      greatly benefit in interoperability if a reasonable set of RTP
      functionalities and extensions are selected. This memo is intended as a
      starting point for discussion of those features in the RTC-Web
      framework.</t>

      <t>This memo is structured into different topics. For each topic, one or
      several recommendations from the authors are given. When it comes to the
      importance of extensions, or the need for implementation support, we use
      three requirement levels to indicate the importance of the feature to
      the RTC-Web specification:</t>

      <t><list style="hanging">
          <t hangText="REQUIRED:">Functionality that is absolutely needed to
          make the RTC-Web solution work well, or functionality of low
          complexity that provides high value.</t>

          <t hangText="RECOMMENDED:">Should be included as its brings
          significant benefit, but the solution can potentially work without
          it.</t>

          <t hangText="OPTIONAL:">Something that is useful in some cases, but
          not always a benefit.</t>
        </list></t>

      <t>When this memo discusses RTP, it includes the RTP Control Protocol
      (RTCP) unless explicitly stated otherwise. RTCP is a fundamental and
      integral part of the RTP protocol, and is REQUIRED to be
      implemented.</t>

      <section title="Expected Topologies">
        <t>As RTC-Web is focused on peer to peer connections established from
        clients in web browsers the following topologies further discussed in
        <xref target="RFC5117">RTP Topologies</xref> are primarily considered.
        The topologies are depicted and briefly explained here for ease of the
        reader.</t>

        <t><figure align="center" anchor="fig-p2p" title="Point to Point">
            <artwork><![CDATA[
+---+         +---+
| A |<------->| B |
+---+         +---+
]]></artwork>
          </figure>The <xref target="fig-p2p">point to point topology</xref>
        is going to be very common in any single user to single user
        applications.</t>

        <figure align="center" anchor="fig-multiU" title="Multi-unicast">
          <artwork><![CDATA[
+---+      +---+
| A |<---->| B |
+---+      +---+
  ^         ^
   \       /
    \     /
     v   v
     +---+
     | C |
     +---+
]]></artwork>
        </figure>

        <t>For small multiparty sessions it is practical enough to create RTP
        sessions by letting every participant send individual unicast RTP/UDP
        flows to each of the other participants. This is called multi-unicast
        and is unfortunately not discussed in the <xref target="RFC5117">RTP
        Topologies</xref>. This topology has the benefit of not requiring
        central nodes. The downside is that it increases the used bandwidth at
        each sender by requiring one copy of the media streams for each
        participant that are part of the same session beyond the sender
        itself. Thus this is limited to scenarios with few end-points unless
        the media is very low bandwidth.</t>

        <t>It needs to be noted that, if this topology is to be supported by
        the RTC-Web framework, it needs to be possible to connect one RTP
        session to multiple established peer to peer flows that are
        individually established.</t>

        <t><figure align="center" anchor="fig-mixer"
            title="RTP Mixer with Only Unicast Paths">
            <artwork><![CDATA[
+---+      +------------+      +---+
| A |<---->|            |<---->| B |
+---+      |            |      +---+
           |   Mixer    |
+---+      |            |      +---+
| C |<---->|            |<---->| D |
+---+      +------------+      +---+
]]></artwork>
          </figure>An <xref target="fig-mixer">RTP mixer</xref> is a
        centralised point that selects or mixes content in a conference to
        optimise the RTP session so that each end-point only needs connect to
        one entity, the mixer. The mixer also reduces the bit-rate needs as
        the media sent from the mixer to the end-point can be optimised in
        different ways. These optimisations include methods like only choosing
        media from the currently most active speaker or mixing together audio
        so that only one audio stream is required in stead of 3 in the
        depicted scenario. The downside of the mixer is that someone is
        required to provide the actual mixer.</t>

        <figure align="center" anchor="fig-relay"
                title="RTP Translator (Relay) with Only Unicast Paths">
          <artwork><![CDATA[
+---+      +------------+      +---+
| A |<---->|            |<---->| B |
+---+      |            |      +---+
           | Translator |
+---+      |            |      +---+
| C |<---->|            |<---->| D |
+---+      +------------+      +---+
]]></artwork>
        </figure>

        <t>If one wants a less complex central node it is possible to use an
        <xref target="fig-relay">relay (called an Transport Translator)</xref>
        that takes on the role of forwarding the media to the other end-points
        but doesn't perform any media processing. It simply forwards the media
        from all other to all the other. Thus one endpoint A will only need to
        send a media once to the relay, but it will still receive 3 RTP
        streams with the media if B, C and D all currently transmits.</t>

        <figure align="center" anchor="fig-translator"
                title="Translator towards Legacy end-point">
          <artwork><![CDATA[
           +------------+
           |            |
+---+      |            |      +---+
| A |<---->| Translator |<---->| B |
+---+      |            |      +---+
           |            |
           +------------+
]]></artwork>
        </figure>

        <t>To support legacy end-point (B) that don't fulfil the requirements
        of RTC-Web it is possible to insert a <xref
        target="fig-translator">Translator</xref> that takes on the role to
        ensure that from A's perspective B looks like a fully compliant
        end-point. Thus it is the combination of the Translator and B that
        looks like the end-point B. The intention is that the presence of the
        translator is transparent to A, however it is not certain that is
        possible. Thus this case is include so that it can be discussed if any
        mechanism specified to be used for RTC-Web results in such issues and
        how to handle them.</t>
      </section>
    </section>

    <section title="Requirements from RTP">
      <t>This section discusses some requirements <xref target="RFC3550"> RTP
      and RTCP</xref> place on their underlying transport protocol, the
      signalling channel, etc.</t>

      <section title="RTP Multiplexing Points">
        <t>There are three fundamental points of multiplexing within the RTP
        framework:</t>

        <t><list style="hanging">
            <t hangText="Use of separate RTP Sessions:">The first, and the
            most important, multiplexing point is the RTP session. This
            multiplexing point does not have an identifier within the RTP
            protocol itself, but instead relies on the lower layer to separate
            the different RTP sessions. This is most often done by separating
            different RTP sessions onto different UDP ports, or by sending to
            different IP multicast addresses. The distinguishing feature of an
            RTP session is that it has a separate SSRC identifier space; a
            single RTP session can span multiple transport connections
            provided packets are gatewayed such that participants are known to
            each other. Different RTP sessions are used to separate different
            types of media within a multimedia session. For example, audio and
            video flows are sent on separate RTP sessions. But also completely
            different usages of the same media type, e.g. video of the
            presenter and the slide video, benefits from being separated.</t>

            <t
            hangText="Multiplexing using the SSRC within an RTP session:">The
            second multiplexing point is the SSRC that separates different
            sources of media within a single RTP session. An example might be
            different participants in a multiparty teleconference, or
            different camera views of a presentation. In most cases, each
            participant within an RTP session has a single SSRC, although this
            may change over time if collisions are detected. However, in some
            more complex scenarios participants may generate multiple media
            streams of the same type simultaneously (e.g., if they have two
            cameras, and so send two video streams at once) and so will have
            more than one SSRC in use at once. The RTCP CNAME can be used to
            distinguish between a single participant using two SSRC values
            (where the RTCP CNAME will be the same for each SSRC), and two
            participants (who will have different RTCP CNAMEs).</t>

            <t
            hangText="Multiplexing using the Payload Type within an RTP session:">If
            different media encodings of the same media type (audio, video,
            text, etc) are to be used at different times within an RTP
            session, for example a single participant that can switch between
            two different audio codecs, the payload type is used to identify
            how the media from that particular source is encoded. When
            changing media formats within an RTP Session, the SSRC of the
            sender remains unchanged, but the RTP Payload Type changes to
            indicate the change in media format.</t>
          </list></t>

        <t>These multiplexing points area fundamental part of the design of
        RTP and are discussed in Section 5.2 of <xref
        target="RFC3550"></xref>. Of special importance is the need to
        separate different RTP sessions using a multiplexing mechanism at some
        lower layer than RTP, rather than trying to combine several RTP
        sessions implicitly into one lower layer flow. This will be further
        discussed in the next section.</t>
      </section>

      <section title="RTP Session Multiplexing">
        <t>In today's network with prolific use of Network Address Translators
        (NAT) and Firewalls (FW), there is a desire to reduce the number of
        transport layer ports used by an real-time media application using
        RTP. This has led some to suggest multiplexing two or more RTP
        sessions on a single transport layer flow, using either the Payload
        Type or SSRC to demultiplex the sessions, in violation of the rules
        outlined above. It is not the first time some people look at RTP and
        question the need for using RTP sessions for different media types,
        and even more the potential need to separate different media streams
        of the same type into different session due to their different
        purposes. Section 5.2 of <xref target="RFC3550"> </xref> outlines some
        of those problems; we elaborate on that discussion, and on other
        problems that occurs if one violates this part of the RTP design and
        architecture.</t>

        <section title="Why RTP Sessions Should be Demultiplexed by the Transport">
          <t>As discussed in Section 5.2 of <xref target="RFC3550"></xref>,
          multiplexing several RTP sessions (e.g., audio and video) onto a
          single transport layer flow introduces the following problems:</t>

          <t><list style="hanging">
              <t hangText="Payload Identification:">If two RTP sessions of the
              same type are multiplexed onto a single transport layer flow
              using the same SSRC but relying on the Payload Type to
              distinguish the session, and one were to change encodings and
              thus acquire a different RTP payload type, there would be no
              general way of identifying which stream had changed encodings.
              This can be avoided by partitioning the SSRC space between the
              two sessions, but that causes other problems as discussed
              below.</t>

              <t hangText="Timing and Sequence Number Space:">An RTP SSRC is
              defined to identify a single timing and sequence number space.
              Interleaving multiple payload types would require different
              timing spaces if the media clock rates differ and would require
              different sequence number spaces to tell which payload type
              suffered packet loss. Using multiple clock rates in a single RTP
              session is problematic, as discussed in <xref
              target="I-D.ietf-avtext-multiple-clock-rates"></xref>. This can
              be avoided by partitioning the SSRC space between the two
              sessions, but that causes other problems as discussed below.</t>

              <t hangText="RTCP Reception Reports:">RTCP sender reports and
              receiver reports can only describe one timing and sequence
              number space per SSRC, and do not carry a payload type field.
              Multiplexing sessions based on the payload type breaks RTCP.
              This can be avoided by partitioning the SSRC space between the
              two sessions, but that causes other problems as discussed
              below.</t>

              <t hangText="RTP Mixers:">Multiplexing RTP sessions of
              incompatible media type (e.g., audio and video) onto a single
              transport layer flow breaks the operation of RTP mixers, since
              they are unable to combine the flows together.</t>

              <t hangText="RTP Translators:">Multiplexing RTP sessions of
              incompatible media type (e.g., audio and video) onto a single
              transport layer flow breaks the operation of RTP some types of
              RTP translator, for example media transcoders, which rely on the
              RTP requirement that all media are of the same type.</t>

              <t hangText="Quality of Service:">Carrying multiple media in one
              RTP session precludes the use of different network paths or
              network resource allocations that are flow based if appropriate.
              It also makes reception of a subset of the media, for example
              just audio if video would exceed the available bandwidth,
              difficult without the use of an RTP translator within the
              network to filter out the unwanted media which unless they are
              trusted devices (and included in the key-exchange). This is
              difficult to combine with media security functions.</t>

              <t hangText="Separate Endpoints:">Multiplexing several sessions
              into one transport layer flow prevents use of a distributed
              endpoint implementation, where audio and video are rendered by
              different processes and/or systems.</t>
            </list></t>

          <t>We do note that some of the above issues are resolved as long as
          there is explicit separation of the RTP sessions when transported
          over the same lower layer transport, for example by inserting a
          multiplexing layer in between the lower transport and the RTP/RTCP
          headers. But a number of the above issue are not resolved by
          this.</t>

          <t>In the RTCWEB context, i.e. web browsers running on various
          end-points it might appear unlikely that flow based QoS is available
          on the end-points that will support RTCWEB. The authors don't
          disagree that it is unlikely for the common case of users in their
          home-network or at WiFi hotspots will have flow-based QoS available.
          However, if one considers enterprise users, especially using
          intranet applications, the availability and desire to use QoS is not
          implausible. There are also web users who use networks that are more
          resource-constrained than wired networks and WIFI networks, for
          example cellular network. The current access network QoS mechanism
          for user traffic in cellular technology from 3GPP are flow
          based.</t>

          <t>RTP's design hasn't been changed, although session multiplexing
          related topics have been discussed at various points of RTP's 20
          year history. The fact is that numerous RTP mechanism and extensions
          have been defined assuming that one can perform session multiplexing
          when needed. Mechanism that has been identified as problematic if
          one doesn't do session separation are:</t>

          <t><list style="hanging">
              <t hangText="Scalability:">RTP was built with media scalability
              in consideration. The simplest way of achieving separation
              between different scalability layers is placing them in
              different RTP sessions, and using the same SSRC and CNAME in
              each session to bind them together. This is most commonly done
              in multicast, and not particularly applicable to RTC-Web, but
              gatewaying of such a session would then require more alterations
              and likely stateful translation.</t>

              <t
              hangText="RTP Retransmission in Session Multiplexing mode:"><xref
              target="RFC4588">RTP Retransmission</xref> does have a mode for
              session multiplexing. This would not be the main mode used in
              RTC-Web, but for interoperability and reduced cost in
              translation support for different RTP Sessions are
              beneficial.</t>

              <t hangText="Forward Error Correction:">The <xref
              target="RFC2733">"An RTP Payload Format for Generic Forward
              Error Correction"</xref> and its update <xref
              target="RFC5109"></xref> can only be used on media formats that
              produce RTP packets that are smaller than half the MTU if the
              FEC flow and media flow being protected are to be sent in the
              same RTP session, this is due to <xref target="RFC2198"> "RTP
              Payload for Redundant Audio Data"</xref>. This is because the
              SSRC value of the original flow is recovered from the FEC
              packets SSRC field. So for anything that desires to use these
              format with RTP payloads that are close to MTU needs to put the
              FEC data in a separate RTP session compared to the original
              transmissions. The usage of this type of FEC data has not been
              decided on in RTCWEB.</t>

              <t hangText="SSRC Allocation and Collision:">The SSRC identifier
              is a random 32-bit number that is required to be globally unique
              within an RTP session, and that is reallocated to a new random
              value if an SSRC collision occurs between participants. If two
              or more RTP sessions share a transport layer flow, there is no
              guarantee that their choice of SSRC values will be distinct, and
              there is no way in standard RTP to signal which SSRC values are
              used by which RTP session. RTP is explicitly a group-based
              communication protocol, and new participants can join an RTP
              session at any time; these new participants may chose SSRC
              values that conflict with the SSRC values used in any of the
              multiplexed RTP sessions. This problem can be avoided by
              partitioning the SSRC space, and signalling how the space is to
              be subdivided, but this is not backwards compatible with any
              existing RTP system. In addition, subdividing the SSRC space
              makes it difficult to gateway between multiplexed RTP sessions
              and standard RTP sessions: the standard sessions may use parts
              of the SSRC space reserved in the multiplexed RTP sessions,
              requiring the gateway to rewrite RTCP packets, as well as the
              SSRC and CSRC list in RTP data packets. Rewriting RTCP is a
              difficult task, especially when one considers extensions such as
              RTCP XR.</t>

              <t hangText="Conflicting RTCP Report Types:">The extension
              mechanisms used in RTCP depend on separation of RTP sessions for
              different media types. For example, the RTCP Extended Report
              block for VoIP is suitable for conversational audio, but clearly
              not useful for Video. This may cause unusable or unwanted
              reports to be generated for some streams, wasting capacity and
              confusing monitoring systems. While this is problem may be
              unlikely for VoIP reports, it may be an issue for the more
              detailed media agnostic reports which are sometimes be used for
              different media types. Also, this makes the implementation of
              RTCP more complex, since partitioning the SSRC space by media
              type needs not only to be one the media processing side, but
              also on the RTCP reporting</t>

              <t hangText="RTCP Reporting and Scheduling:">The RTCP reporting
              interval and its packet scheduling will be affected if several
              RTP sessions are multiplexed onto the same transport layer flow.
              The reporting interval is determined by the session bandwidth,
              and the reporting interval chosen for a high-rate video session
              will be different to the interval chosen by a low-rate VoIP
              session. If such sessions are multiplexed, then participants in
              one session will see the SSRC values of the other session. This
              will cause them to overestimate the number of participants in
              the session by a factor of two, thus doubling their RTCP
              reporting interval, and making their feedback less timely. In
              the worst case, when an RTP session with very low RTCP bandwidth
              is multiplexed with an RTP session with high RTCP bandwidth,
              this may cause repeated RTCP timer reconsideration, leading to
              the members of the low bandwidth session timing out.
              Participants in an RTP session configured with high bandwidth
              (and short RTCP reporting interval) will see RTCP reports from
              participants in the low bandwidth session much less often than
              expected, potentially causing them to repeatedly timeout and
              re-create state for those participants. The split of RTCP
              bandwidth between senders and receivers (where at least 25% of
              the RTCP bandwidth is allocated to senders) will be disrupted if
              a session with few senders (e.g., a VoIP session) is multiplexed
              with a session with many senders (e.g., a video session). These
              issues can be resolved if the partition of the SSRC is
              signalled, but this is not backwards compatible with any
              existing RTP system. The partition would require re-implementing
              large part of the RTCP processing to take the individual
              sessions into account.</t>

              <t hangText="Sampling Group Membership:">The mechanism defined
              in RFC2762 to sample the group membership, allowing participants
              to keep less state, assumes a single flat 32-bit SSRC space, and
              breaks if the SSRC space is shared between several RTP
              sessions.</t>
            </list></t>

          <t>As can be seen, the requirement that separate RTP sessions are
          carried in separate transport-layer flows is fundamental to the
          design of RTP. Due to this design principle, implementors of various
          services or applications using RTP have not commonly violated this
          model, and have separated RTP sessions onto different transport
          layer flows. After 15 years of deployment of RTP in its current
          form, any move to change this assumption must carefully consider the
          backwards compatibility problems that this will cause. In
          particular, since widespread use of multiplexed RTP sessions in
          RTC-Web will almost certainly cause their use in other scenarios,
          the discussion regarding compatibility must be wider than just
          whether multiplexing works for the extremely limited subset of RTP
          use cases currently being considered in the RTC-Web group. Any such
          multiplexing extension to RTP must therefore be developed by the
          AVTCORE working group, since it has much broader applicability and
          scope than RTC-Web.</t>
        </section>

        <section title="Arguments for a single transport flow">
          <t>The arguments the authors are aware of for why it is desirable to
          use a single underlying transport (e.g., UDP) flow for all media,
          rather than one flow for each type of media are the following:</t>

          <t><list style="hanging">
              <t hangText="End-Point Port Consumption:">A given IP address
              only has 16-bits of available port space per transport protocol
              for any consumer of ports that exists on the machine. This is
              normally never an issue for a end-user machine. It can become an
              issue for servers that has large number of simultaneous flows.
              However, in RTCWEB where we will use authenticated STUN requests
              a server can serve multiple end-point from the same local port,
              and use the whole 5-tuple (source and destination address,
              source and destination port, protocol) as identifier of flows.
              Thus, in theory, the minimal number of media server ports needed
              are the maximum number of simultaneous RTP sessions a single
              end-point may use, when in practice implementation probably
              benefit from using more.</t>

              <t hangText="NAT State:">If an end-point is behind a NAT each
              flow it generates to an external address will result in state on
              that NAT. That state is a limited resource, either from memory
              or processing stand-point in home or SOHO NATs, or for large
              scale NATs serving many internal end-points, the available ports
              run-out. We see this primarily as a problem for larger
              centralised NATs where end-point independent mapping do require
              each flow mapping to use one port for the external IP address,
              thus affecting the the maximum aggregation of internal users per
              external IP address. However, we would like to point out that a
              RTCWEB session with audio and video are likely using 2 or 3 UDP
              flows. This can be contrasted with that certain web applications
              that can result that 100+ TCP flows are opened to various
              servers. Sure they are recovered more quickly due to the
              explicit session teardown when no longer need, at the same time
              more web sites may be simultaneously communicated in various
              browser tabs. So the question is if the UDP mapping space is as
              heavily used as the TCP mapping space, and that TCP will
              continue to be the limiting factor for the amount of internal
              users a particular NAT can support.</t>

              <t hangText="NAT Traversal taking additional time:">When doing
              NAT/FW traversal it takes additional time to open additional
              ports. And it takes time in a phase of communication between
              accepting to communicate and the media path being established
              which is a fairly critical. The best case scenario for how much
              extra time it can take following the specified ICE procedures
              are. 1.5*RTT + Ta*(Additional_Flows-1), where Ta is the pacing
              timer, which ICE specifies to be no smaller than 20 ms. That
              assumes a message in one direction, and then an immediate
              triggered check back. This as ICE first finds one candidate pair
              that works prior to establish multiple flows. Thus, there is no
              extra time until one has found a working candidate pair, from
              that is only the time it takes to in parallel establish the
              additional flows which in most case are 1 or 2 more additional
              flows.</t>

              <t hangText="NAT Traversal Failure Rate:">In cases when one
              needs more than a single flow to be established through the NAT
              there is some risk that one succeed in establishing the first
              flow but fails with one or more of the additional flows. The
              risk that this happens are hard to quantify. However, that risk
              should be fairly low as one has just prior successfully
              established one flow from the same interfaces. Thus only rare
              events as NAT resource overload, or selecting particular port
              numbers that are filtered etc, should be reasons for
              failure.</t>
            </list></t>
        </section>

        <section title="Summary">
          <t>As we have noted in the preceding sections, implicit multiplexing
          of multiple RTP sessions onto a single transport flow raises a large
          number of backwards compatibility issues. It has been argued that
          these issues are either not important, since the RTP features
          disrupted are not of interest to the current set of RTC-Web use
          cases, or can be solved by somehow explicitly dividing the SSRC
          space into different regions for different RTP sessions. We believe
          the first argument is short-sighted: those RTP features may not be
          important today, but the successful deployment of simple RTC-Web
          applications will generate interest to try more advanced scenarios,
          which may well need those features. Partitioning the SSRC space to
          separate RTP sessions results in new set of issues, where the
          biggest from our point of view is that it effectively creates a new
          variant of the RTP protocol, which is incompatible with standard
          RTP. Having two different variants of the core functionality of RTP
          will make it much more difficult to develop future protocol
          extensions, and the new variant will likely also have different set
          of extensions that work. In addition the two versions aren't
          directly interoperable, and will force anyone that want to
          interconnect the two version to deploy (complex) gateways. It also
          reduces the common user base and interest in maintaining and
          developing either version.</t>

          <t>On the other hand, we are sympathetic to the argument that using
          a single transport flow does save some time in setup processing, it
          will save some resources on NATs and FWs that are in between the
          end-points communicating, it may have somewhat higher success rate
          of session establishment.</t>

          <t>Thus the authors considered it REQUIRED that RTP sessions are
          multiplexed using an explicit mechanism outside RTP. We strongly
          RECOMMENDED that the mechanism used to accomplish this multiplexing
          is to use unique UDP flows for each RTP session, based on simplicity
          and interoperability. However, we can accept a WG consensus that
          using a single transport layer flow between peers is the default,
          and that also the fallback of using separate UDP flows are
          supported, under one constraint: that the RTP sessions are
          explicitly multiplexed in such a way existing mechanism or
          extensions to RTP are not prevented to work, and that the solution
          does not result in that an alternative variant of RTP is created
          (i.e., it must not disrupt RTCP processing, and the RTP semantics).
          In this later case we RECOMMEND that some type of multiplexing layer
          is inserted between UDP flow and the RTP/RTCP headers to separate
          the RTP sessions, since removing this shim-layer and gatewaying to
          standard RTP sessions is simpler than trying to separate RTP
          sessions that are multiplexed together to gateway them to standard
          RTP sessions. We discuss possible multiplexing layers in <xref
          target="sec-mux-solutions"></xref>.</t>
        </section>
      </section>

      <section anchor="sdp" title="Signalling for RTP sessions">
        <t>RTP is built with the assumption of an external to RTP/RTCP
        signalling channel to configure the RTP sessions and its functions.
        The basic configuration of an RTP session consists of the following
        parameters:</t>

        <t><list style="hanging">
            <t hangText="RTP Profile:">The name of the RTP profile to be used
            in session. The <xref target="RFC3551">RTP/AVP</xref> and <xref
            target="RFC4585">RTP/AVPF</xref> profiles can interoperate on
            basic level, as can their secure variants <xref
            target="RFC3711">RTP/SAVP</xref> and <xref
            target="RFC5124">RTP/SAVPF</xref>. The secure variants of the
            profiles do not directly interoperate with the non-secure
            variants, due to the presence of additional header fields in
            addition to any cryptographic transformation of the packet
            content.</t>

            <t hangText="Transport Information:">Source and destination
            address(s) and ports for RTP and RTCP must be signalled for each
            RTP session. If <xref target="RFC5761">RTP and RTCP
            multiplexing</xref> is to be used, such that a single port is used
            for RTP and RTCP flows, this must be signalled.</t>

            <t
            hangText="RTP Payload Types, media formats, and media format parameters:">The
            mapping between media type names (and hence the RTP payload
            formats to be used) and the RTP payload type numbers must be
            signalled. Each media type may also have a number of media type
            parameters that must also be signalled to configure the codec and
            RTP payload format (the "a=fmtp:" line from SDP).</t>

            <t hangText="RTP Extensions:">The RTP extensions one intends to
            use need to be agreed upon, including any parameters for each
            respective extension. At the very least, this will help avoiding
            using bandwidth for features that the other end-point will ignore.
            But for certain mechanisms there is requirement for this to happen
            as interoperability failure otherwise happens.</t>

            <t hangText="RTCP Bandwidth:">Support for exchanging RTCP
            Bandwidth values to the end-points will be necessary, as described
            in <xref target="RFC3556">"Session Description Protocol (SDP)
            Bandwidth Modifiers for RTP Control Protocol (RTCP)
            Bandwidth"</xref>, or something semantically equivalent. This also
            ensures that the end-points have a common view of the RTCP
            bandwidth, this is important as too different view of the
            bandwidths may lead to failure to interoperate.</t>
          </list></t>

        <t>These parameters are often expressed in SDP messages conveyed
        within an offer/answer exchange. RTP does not depend on SDP or on the
        offer/answer model, but does require all the necessary parameters to
        be agreed somehow, and provided to the RTP implementation. We note
        that in RTCWEB context it will depend on the signalling model and API
        how these parameters need to be configured but they will be need to
        either set in the API or explicitly signalled between the peers.</t>
      </section>

      <section title="(Lack of) Signalling for Payload Format Changes">
        <t>As discussed in <xref target="sdp"></xref>, the mapping between
        media type name, and its associated RTP payload format, and the RTP
        payload type number to be used for that format must be signalled as
        part of the session setup. An endpoint may signal support for multiple
        media formats, or multiple configurations of a single format, each
        using a different RTP payload type number. If multiple formats are
        signalled by an endpoint, that endpoint is REQUIRED to be prepared to
        receive data encoded in any of those formats at any time. RTP does not
        require advance signalling for changes between formats that were
        signalled during the session setup. This is needed for rapid rate
        adaptation.</t>
      </section>
    </section>

    <section anchor="sec-mux-solutions" title="RTP Session Multiplexing">
      <t>This section explores a few different possible solutions for how to
      achieve explicit multiplexing between RTP sessions and possible other
      UDP based flows, such as STUN and protocols carrying application data.
      But before diving into the proposals we should consider a bit what
      requirements we can derive from the previous discussion and the intended
      goals.</t>

      <t>General Requirements for this multiplexing solution as we understand
      them are:</t>

      <t><list style="hanging">
          <t hangText="On top of a single flow:">To get the full set of
          benefits of reducing the number of transport flows between two peers
          one should be able to multiplex all peer traffic from one
          application instance over a single transport flow.</t>

          <t hangText="On top of UDP:">The primary transport protocol that
          meets real-time requirements and has reasonable NAT/FW traversal
          properties are UDP. So the solution are REQUIRED to work over
          this.</t>

          <t hangText="Fallback Protocol:">If UDP fails to traverse the NAT/FW
          including using TURN when available a fallback option has been
          discussed. This would be <xref
          target="I-D.ietf-hybi-thewebsocketprotocol">WebSocket</xref> or over
          <xref target="RFC2616">HTTP(S)</xref>. Over HTTP one likely need to
          consider the media stream as parts of a unknown length binary object
          and thus provide framing and multiplexing between what would be sent
          as individual IP packets. WebSocket provides framing, but here
          multiplexing is needed.</t>

          <t hangText="Protocols to Multiplex:">The protocols that need to be
          multiplexed over this lower layer transport are: <list
              style="numbers">
              <t><xref target="RFC5389">STUN</xref> or something similar to
              enable the <xref target="RFC5245">ICE-like connectivity
              checks</xref> to be performed.</t>

              <t>RTP Sessions: One or more for each media type (audio and
              video) that the application desires to setup. For example we may
              need more than one RTP session to allow easy separation of video
              streams showing the person speaking and a slide video stream.
              There has also been proposal for supporting simulcasting to
              enable non-transcoding centralised conferencing.</t>

              <t>DTLS-SRTP or ZRTP are two proposals for how to do
              key-management for SRTP. Both are in-band key-management schemes
              that will be sent on the same flow as SRTP will be sent as soon
              as the key-management has completed. Thus they must also
              successfully be multiplexed. In addition there is a question if
              each RTP session needs its own keying context, then also the
              different DTLS handshakes needs to be separated.</t>

              <t>Protocols for non-RTP media data. Such protocols provide a
              datagram service to the application that is congestion
              controlled and secured. The exact protocol is not yet decided.
              For securing this DTLS is a likely candidate, however the order
              of the protocols are not clear. If it is foo over DTLS or DTLS
              over foo is yet to be decided.</t>

              <t>Reliable Data transmission protocol. There has been some
              interest for a reliable data transport between the peer. It is
              uncertain if this is going to be defined from the start, later
              or not at all.</t>
            </list></t>

          <t hangText=""></t>
        </list>Please keep these general requirements in mind when we look at
      some possible solutions.</t>

      <section title="DCCP Based Solution">
        <t>The most reasonable approach is to use DCCP as common multiplexing
        layer, at least for RTP and non-RTP data and use DCCP's function for
        congestion control in both cases. This would result in a stack picture
        that looks like this:</t>

        <figure align="center" title="RTP and Data on top of DCCP">
          <artwork><![CDATA[
       +-------------+------+
       |    Media    | FOO  |
       +------+------+  |   + 
       | SRTP | DTLS | DTLS | 
+------+------+------+------+ 
| STUN |        DCCP        |
+------+--------------------+
|            UDP            |
+---------------------------+
]]></artwork>
        </figure>

        <t>STUN and DCCP can be demultiplexed simply as long as the DCCP
        source port are in the range 16384-65535. The great benefit of this
        solution is that it can support large number of parallel explicitly
        multiplexed datagram flows. Another great benefit is a common place
        for congestion control implementation for both RTP and non-RTP data.
        It also provides a negotiation mechanism for transport features,
        including congestion control algorithms, enabling future development
        of this layer.</t>

        <!--?: We will need to design the necessary SDP signalling 
MW: I don't know if much signalling is needed. The DCCP over UDP is being defined.
Thus allowing one to do per m= line define the DCCP port and Service code that one will receive
a particular flow. -->

        <t>The above leaves out the question of a reliable transport solution.
        This can be done in two major ways as far as we can see. Either build
        reliability extensions on top of DCCP or put a protocol in parallel
        with STUN and DCCP. The downside with the latter is that we again end
        up in a situation where we have several protocols that can occur in
        the outer UDP payload requiring implicit demultiplexing based on
        actual data, rather than on a field. As DCCP has a negotiation
        mechanism for both what service that uses DCCP and DCCP options and
        features both becomes viable methods for defining reliability
        extensions.</t>

        <t>Note: that the main reason not also putting STUN on top of DCCP is
        the fact that DCCP do require a handshake on transport parameters when
        establishing a new flow. Thus performing that negotiation prior to
        doing verification of connection increase both the amount of data that
        will be transmitted to a not yet consenting peer and the the increased
        delay.</t>
      </section>

      <section title="SHIM layer">
        <t>A very straightforward design would be adding a one or two byte
        shim layer on top of the transport payload prior to the actual
        multiplexed protocols. This allows both for static assignment of shim
        code-points like for STUN and for dynamically agreed on usages, either
        explicitly through signalling or implicitly by application
        context.</t>

        <figure align="center" title="Using a SHIM layer on top of UDP">
          <artwork><![CDATA[
       +-------------+------+
       |    Media    | DTLS |
+------+------+------+------+ 
| STUN | SRTP | DTLS | FOO  | 
+------+------+------+------+ 
|            SHIM           |
+---------------------------+ 
|            UDP            | 
+---------------------------+ ]]></artwork>
        </figure>

        <t>The Internet Draft <xref target="I-D.cbran-rtcweb-data">"RTC-Web
        Non-Media Data Transport Requirements"</xref> dismisses the idea of a
        generic SHIM layer for a number of reasons:</t>

        <t><list style="hanging">
            <t
            hangText="Breaking interoperability with existing inspection gear:">The
            authors of <xref target="I-D.cbran-rtcweb-data"></xref> point out
            the need for recognising the specific SSRC for recognising the
            special magic cookie. A device upgraded to perform this kind of a
            matching could also be modified to inspect a SHIM layer. Assuming
            that a SHIM layer will be introduced in the IETF anyway, it
            appears more beneficial to have a single upgrade to networking
            gear capable of supporting a set of protocols than defining
            application-specific extensions.</t>

            <t
            hangText="Adding complexity through another muxing layer:">Removing
            an extra fixed size header is trivial. In contrast to SSRC-based
            demultiplexing, this could even be easily supported by the
            operating system. It should also be noted that both SSRC-based and
            SHIM layer-based demultiplexing require all media streams to
            terminate within the same application process and hence similar
            application-internal mechanisms to forward media data to the
            correct media engine for processing. It is thus hard to see the
            "adding complexity" reasoning.</t>

            <t hangText="Increase packet overhead further:">A reasonably
            designed SHIM layer would only add a few bytes of overhead. Given
            that the entire discussion is motivated by audio/video calls and
            video packets would dominate a media stream both in number and in
            size, the relative overhead is minimal and the point appear
            moot.</t>

            <t hangText="Shim is a mistake which cannot be undone later:">One
            can argue the same for overloading the SSRC identifier space. SHIM
            layers have repeatedly been discussed in the IETF because new
            protocols, such as DCCP and SCTP, face deployment problems in the
            real-world Internet as they use previously unknown IP protocol
            numbers. The only issue is that the IETF has not yet decided on a
            (common) SHIM layer. And if the shim layer is explicitly signalled
            and there exist fallback solution to using separate UDP flows,
            then it can in fact be undone.</t>
          </list></t>

        <t>A shim layer has low overhead combined with explicitness and great
        flexibility on what to put on top. In addition to definition of the
        shim itself some signalling will needed, either explicit or implicit
        depending on how the signalling model and the API. The signalling
        needs to assign meaning to what a particular multiplexing code-point
        means in the particular underlying transport flow.</t>

        <t>Although a reliable protocol isn't included in the above example it
        can easily be included and be anything that can put in a UDP payload
        such as TCP, RMT based, home grown. Thus ensuring maximum flexibility
        to add additional protocols on top of the single UDP flow.</t>
      </section>

      <section title="RTP Internal Multiplexing">
        <t>The main point with RTP internal multiplexing is to enable
        multiplexing RTP sessions without adding any extra layer between the
        RTP header and the lower transport, e.g. single UDP flow, that things
        are multiplex on. <xref target="I-D.rosenberg-rtcweb-rtpmux">Rosenberg
        </xref> suggests one method for RTP Internal Multiplexing. In addition
        to this there are suggestion in <xref
        target="I-D.cbran-rtcweb-data">"RTC-Web Non-Media Data Transport
        Requirements"</xref> to multiplex also the non-RTP data on the same
        level using implicit identification of data packets that separate them
        from DTLS-SRTP packets, RTP/RTCP packets and STUN packets. This
        results in a stack picture that looks like this:</t>

        <figure align="center" title="RTP Internal Multiplexing">
          <artwork><![CDATA[
       +-------------+------+
       |    Media    | DTLS |
+------+------+------+------+ 
| STUN | SRTP | DTLS | FOO  | 
+------+------+------+------+ 
|            UDP            | 
+---------------------------+ ]]></artwork>
        </figure>

        <t>Where Foo is the protocol suggested by <xref
        target="I-D.cbran-rtcweb-data">"RTC-Web Non-Media Data Transport
        Requirements"</xref>.</t>

        <t>These proposals rely on the idea that a receiver can look at a
        number of the bytes of the UDP payload to identify the type of packet.
        So assuming DTLS-SRTP key management and a datagram non-RTP data
        transport we have at least four protocols to separate. If one have
        successfully identified the protocol as (S)RTP then one looks at the
        SSRC field to find out media type and stream IDs.</t>

        <t>There are a number of issues with the current proposals which we
        will raise below. We also discuss what is going to be needed to drive
        this work.</t>

        <section title="Issues with SSRC RTP Multiplexing">
          <t>The first argument against this design is that it further
          proliferates this bad design of implicit packet identification that
          started with STUN. And instead of trying to break out of this
          pattern we appear to pile on more protocols that is supposed to
          identified despite that all these protocols actually have protocol
          fields that have a purpose in these overlapping bytes that we
          attempt to perform identification in. At some point a protocol
          extension in either of the protocols will result in a collision
          breaking the demultiplexing mechanism.</t>

          <t>Secondly, the design restricts RTCWEB to a subset of RTP
          functionality. By redefining the SSRC field this creates in practice
          an alternative RTP protocol that can't fully interoperate with RTP
          as currently defined. The inclusion of a magic word that allows Deep
          Packet Inspection and other interpreters to commonly identify the
          versions correctly is a clear admission to this fact, even if not
          state explicitly in the text. This new version is forever prevented
          from using any of the features that has been identified as not being
          compatible with this design. In addition it either forces future RTP
          extensions to take this severe limitation in into account or create
          additional extensions that are not compatible. Forking the RTP
          protocol into two versions is really not desirable.</t>

          <t>Thirdly, a significantly limited size stream ID field requires
          someone to manage and ensure that unique stream IDs are used by each
          end-point. This would not be an issue if the only use case ever
          would be communication between two end-points. However, we at this
          point have use cases and requirements for centralised conferencing
          scenarios. Even a basic star scenario requires extra complexities as
          the central node needs to be able to force the node that aren't at
          the centre to use the IDs that the central node dictates. This usage
          then becomes much more complex at the very moment someone attempts
          to interconnect two stars. This is in fact likely to happen when one
          needs either scalability or geographical optimisation. With
          geographical optimisation I mean one entity in Asia and one in
          Africa that performs media mixing or transport relaying to reduce
          the delay and traffic load. In addition to the centralised
          conferencing usage, it looks plausible that RTCWEB could allow for
          an ad-hoc conferencing mesh. Without a central point beyond the web
          server, only the web server could ensure the uniqueness
          requirements. All of the above cases is easily handled by regular
          RTP without any control at all. Showing that this proposal brings
          extra complexities.</t>

          <t>Fourth, if any legacy interoperation is considered one should be
          aware that it occurs that the same SSRC value is used in different
          RTP session in the same communication session. Commonly for
          providing quick association of media streams in the different
          sessions, sometime due to implementation choices, and sometime due
          to that an extension requires this, like the <xref
          target="RFC4588">session mode of RTP retransmission</xref>.</t>

          <t>Fifth, there is a need to support more than a single session
          context per media type. As shown in <xref
          target="I-D.westerlund-avtcore-multistream-and-simulcast">"RTP
          Multiple Stream Sessions and Simulcast"</xref> there are clear
          benefits in using multiple RTP sessions for separating intent with
          different media streams. This is already occurring in video
          conferencing to separate main video (e.g. active speaker) from
          alternative video (e.g. non-active speaker, audience) and document
          or slide video streams. We will not deny that the web server could
          track the flows and their purpose through other mechanisms and
          signalling channels. However, it complicates any interop with legacy
          and forces more functionality and additional APIs into any gateway
          function.</t>
        </section>

        <section title="Executing on this Proposal">
          <t>If RTCWEB WG decides that despite the issues associated with RTP
          internal multiplexing wants to pursue this approach the WG needs to
          be aware that this WG doesn't have the right to redefine RTP
          semantics. The IETF has an active WG chartered for maintaining and
          extending RTP in the AVTCORE WG, and proposal for change needs to be
          handled in that WG. This means that all RTCWEB WG can do for the RTP
          multiplexing part is to provide requirements to AVTCORE. The WG
          participants would then be encouraged to engage in proposing and be
          proponents for the work in the AVTCORE WG.</t>

          <t>Considering that not only RTCWEB is has voiced the need for a
          multiplexing solution and that this likely have significant impact
          on RTP for the future, any proposal for a solution needs to be
          generally applicable. For example most of the arguments dismissed in
          <xref target="I-D.rosenberg-rtcweb-rtpmux">"Multiplexing of
          Real-Time Transport Protocol (RTP) Traffic for Browser based
          Real-Time Communications (RTC)"</xref> as not being applicable for
          RTCWEB will need to be reconsidered in the light of more general
          applications.</t>

          <t>So some requirements on this solution are from the authors of
          this draft:</t>

          <t><list style="numbers">
              <t>Possible to multiplex more than a single RTP session of the
              same media type.</t>

              <t>Be possible to use all relevant RTP/RTCP extensions and RTP
              payload formats.</t>

              <t>Be possible to use a particular SSRC value in more than a
              single RTP session simultaneously.</t>

              <t>Be possible to interconnect through a gateway the RTP
              sessions that are multiplexed on a single transport flow back to
              using multiple transport flows to a legacy end-point otherwise
              supporting the applications RTP configuration. This should
              preferably done with minimal state, especially avoid per SSRC
              state.</t>
            </list></t>

          <t></t>
        </section>
      </section>

      <section title="Conclusion">
        <t>Looking at these proposals we authors are clearly in favour of a
        shim layer unless DCCP is being selected anyway as datagram or media
        transport protocol which in case one should strongly consider having
        both data and media over the same protocol to enable that it is used
        as multiplexing layer.</t>

        <t>We don't see RTP internal as a realistic contender for the first
        phase of RTCWEB specifications. It has documented issues. The only way
        forward for the WG is to develop requirements for what RTCWEB needs
        and share these with AVTCORE. If there are proponents for driving a
        solution, they take the design of a generalised protocol in AVTCORE
        that takes into consideration the existing specification. It might
        find a suitable solution, it may not. When this is done we might have
        something stable to start deploying in two years from now or the WG
        has decided to drop the work as non feasible.</t>
      </section>
    </section>

    <section title="RTP Profile">
      <t>The <xref target="RFC5124">"Extended Secure RTP Profile for Real-time
      Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)"</xref> is
      REQUIRED to be implemented. This builds on the basic <xref
      target="RFC3551">RTP/AVP profile</xref>, the <xref
      target="RFC4585">RTP/AVPF feedback profile</xref>, and the secure <xref
      target="RFC3711">RTP/SAVP profile</xref>.</t>

      <t>The RTP/AVPF part of RTP/SAVPF is required to get the improved RTCP
      timer model, that allows more flexible transmission of RTCP packets in
      response to events, rather than strictly according to bandwidth. This
      also saves RTCP bandwidth and will commonly only use the full amount
      when there is a lot of events on which to send feedback. This
      functionality is needed to make use of the RTP conferencing extensions
      discussed in <xref target="conf-ext"></xref>.</t>

      <t>The RTP/SAVP part of RTP/SAVPF is for support for <xref
      target="RFC3711"> Secure RTP (SRTP)</xref>. This provides media
      encryption, integrity protection, replay protection and a limited form
      of source authentication. It does not contain a specific keying
      mechanism, so that, and the set of security transforms, will be required
      to be chosen. It is possible that a security mechanism operating on a
      lower layer than RTP can be used instead and that should be evaluated.
      However, the reasons for the design of SRTP should be taken into
      consideration in that discussion.</t>
    </section>

    <section title="RTP and RTCP Guidelines">
      <t>RTP and RTCP are two flexible and extensible protocols that allow, on
      the one hand, choosing from a variety of building blocks and combining
      those to meet application needs, and on the other hand, create
      extensions where existing mechanisms are not sufficient: from new
      payload formats to RTP extension headers to additional RTCP control
      packets.</t>

      <t>Different informational documents provide guidelines to the use and
      particularly the extension of RTP and RTCP, including the following:
      <xref target="RFC2736">Guidelines for Writers of RTP Payload Format
      Specifications</xref> and <xref target="RFC5968">Guidelines for
      Extending the RTP Control Protocol</xref>.</t>
    </section>

    <section title="RTP Optimisations">
      <t>This section discusses some optimisations that makes RTP/RTCP work
      better and more efficient and therefore are considered.</t>

      <section title="RTP and RTCP Multiplexing">
        <t>Historically, RTP and RTCP have been run on separate UDP ports.
        With the increased use of Network Address/Port Translation (NAPT) this
        has become problematic, since maintaining multiple NAT bindings can be
        costly. It also complicates firewall administration, since multiple
        ports must be opened to allow RTP traffic. To reduce these costs and
        session setup times, support for multiplexing RTP data packets and
        RTCP control packets on a single port <xref target="RFC5761"></xref>
        is REQUIRED. Supporting this specification is generally a
        simplification in code, since it relaxes the tests in <xref
        target="RFC3550"></xref>.</t>

        <t>Note that the use of RTP and RTCP multiplexed on a single port
        ensures that there is occasional traffic sent on that port, even if
        there is no active media traffic. This may be useful to keep-alive NAT
        bindings.</t>
      </section>

      <section title="Reduced Size RTCP">
        <t>RTCP packets are usually sent as compound RTCP packets; and RFC
        3550 demands that those compound packets always start with an SR or RR
        packet. However, especially when using frequent feedback messages,
        these general statistics are not needed in every packet and
        unnecessarily increase the mean RTCP packet size and thus limit the
        frequency at which RTCP packets can be sent within the RTCP bandwidth
        share.</t>

        <t>RFC5506 <xref target="RFC5506">"Support for Reduced-Size Real-Time
        Transport Control Protocol (RTCP): Opportunities and
        Consequences"</xref> specifies how to reduce the mean RTCP message and
        allow for more frequent feedback. Frequent feedback, in turn, is
        essential to make real-time application quickly aware of changing
        network conditions and allow them to adapt their transmission and
        encoding behaviour.</t>

        <t>Support for RFC5506 is REQUIRED.</t>
      </section>

      <section title="Symmetric RTP/RTCP">
        <t>RTP entities choose the RTP and RTCP transport addresses, i.e., IP
        addresses and port numbers, to receive packets on and bind their
        respective sockets to those. When sending RTP packets, however, they
        may use a different IP address or port number for RTP, RTCP, or both;
        e.g., when using a different socket instance for sending and for
        receiving. Symmetric RTP/RTCP requires that the IP address and port
        number for sending and receiving RTP/RTCP packets are identical.</t>

        <t>The reasons for using symmetric RTP is primarily to avoid issues
        with NAT and Firewalls by ensuring that the flow is actually
        bi-directional and thus kept alive and registered as flow the intended
        recipient actually wants. In addition it saves resources in the form
        of ports at the end-points, but also in the network as NAT mappings or
        firewall state is not unnecessary bloated. Also the number of QoS
        state are reduced.</t>

        <t>Using <xref target="RFC4961">Symmetric RTP and RTCP</xref> is
        REQUIRED.</t>
      </section>

      <section title="Generation of the RTCP Canonical Name (CNAME)">
        <t>The RTCP Canonical Name (CNAME) provides a persistent
        transport-level identifier for an RTP endpoint. While the
        Synchronisation Source (SSRC) identifier for an RTP endpoint may
        change if a collision is detected, or when the RTP application is
        restarted, it's RTCP CNAME is meant to stay unchanged, so that RTP
        endpoints can be uniquely identified and associated with their RTP
        media streams. For proper functionality, RTCP CNAMEs should be unique
        among the participants of an RTP session.</t>

        <t>The <xref target="RFC3550">RTP specification</xref> includes
        guidelines for choosing a unique RTP CNAME, but these are not
        sufficient in the presence of NAT devices. In addition, some may find
        long-term persistent identifiers problematic from a privacy viewpoint.
        Accordingly, support for generating a short-term persistent RTCP
        CNAMEs following method (b) as specified in Section 4.2 of <xref
        target="RFC6222">"Guidelines for Choosing RTP Control Protocol (RTCP)
        Canonical Names (CNAMEs)"</xref> is RECOMMENDED, since this addresses
        both concerns.</t>
      </section>
    </section>

    <section title="RTP Extensions">
      <t>There are a number of RTP extensions that could be very useful in the
      RTC-Web context. One set is related to conferencing, others are more
      generic in nature.</t>

      <section anchor="conf-ext" title="RTP Conferencing Extensions">
        <t>RTP is inherently defined for group communications, whether using
        IP multicast, multi-unicast, or based on a centralised server. In
        today's practice, however, overlay-based conferencing dominates,
        typically using one or a few so-called conference bridges or servers
        to connect endpoints in a star or flat tree topology. Quite diverse
        conferencing topologies can be created using the basic elements of RTP
        mixers and translators as defined in RFC 3550.</t>

        <t>An number of conferencing topologies are defined in <xref
        target="RFC5117"></xref> out of the which the following ones are the
        more common (and most likely in practice workable) ones:</t>

        <t>1) RTP Translator (Relay) with Only Unicast Paths (RFC 5117,
        section 3.3)</t>

        <t>2) RTP Mixer with Only Unicast Paths (RFC 5117, section 3.4)</t>

        <t>3) Point to Multipoint Using a Video Switching MCU (RFC 5117,
        section 3.5)</t>

        <t>4) Point to Multipoint Using Content Modifying MCUs (RFC 5117,
        section 3.6)</t>

        <t>We note that 3 and 4 are not well utilising the functions of RTP
        and in some cases even violates the RTP specifications. Thus we
        recommend that one focus on 1 and 2.</t>

        <t>RTP protocol extensions to be used with conferencing are included
        because they are important in the context of centralised conferencing,
        where one RTP Mixer (Conference Focus) receives a participants media
        streams and distribute them to the other participants. These messages
        are defined in the <xref target="RFC4585">Extended RTP Profile for
        Real-time Transport Control Protocol (RTCP)-Based Feedback
        (RTP/AVPF)</xref> and the <xref target="RFC5104">"Codec Control
        Messages in the RTP Audio-Visual Profile with Feedback (AVPF)"
        (CCM)</xref> and are fully usable by the <xref target="RFC5124">Secure
        variant of this profile (RTP/SAVPF)</xref>.</t>

        <section title="RTCP Feedback Message: Full Intra Request">
          <t>The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of
          <xref target="RFC5104">CCM</xref>. It is used to have the mixer
          request from a session participants a new Intra picture. This is
          used when switching between sources to ensure that the receivers can
          decode the video or other predicted media encoding with long
          prediction chains. It is RECOMMENDED that this feedback message is
          supported.</t>
        </section>

        <section title="RTCP Feedback Message: Picture Loss Indicator">
          <t>The Picture Loss Indicator is defined in Section 6.3.1 of <xref
          target="RFC4585">AVPF</xref>. It is used by a receiver to tell the
          encoder that it lost the decoder context and would like to have it
          repaired somehow. This is semantically different from the Full Intra
          Request above. It is RECOMMENDED that this feedback message is
          supported as a loss tolerance mechanism.</t>
        </section>

        <section title="RTCP Feedback Message: Temporary Maximum Media Stream Bit Rate Request">
          <t>This feedback message is defined in Section 3.5.4 and 4.2.1 in
          <xref target="RFC5104">CCM</xref>. This message and its notification
          message is used by a media receiver, to inform the sending party
          that there is a current limitation on the amount of bandwidth
          available to this receiver. This can be for various reasons, and can
          for example be used by an RTP mixer to limit the media sender being
          forwarded by the mixer (without doing media transcoding) to fit the
          bottlenecks existing towards the other session participants. It is
          RECOMMENDED that this feedback message is supported.</t>
        </section>
      </section>

      <section title="RTP Header Extensions">
        <t>The <xref target="RFC3550">RTP specification</xref> provides a
        capability to extend the RTP header with in-band data, but the format
        and semantics of the extensions are poorly specified. Accordingly, if
        header extensions are to be used, it is REQUIRED that they be
        formatted and signalled according to the general mechanism of RTP
        header extensions defined in <xref target="RFC5285"></xref>.</t>

        <t>As noted in <xref target="RFC5285"></xref>, the requirement from
        the RTP specification that header extensions are "designed so that the
        header extension may be ignored" <xref target="RFC3550"></xref>
        stands. To be specific, header extensions must only be used for data
        that can safely be ignored by the recipient without affecting
        interoperability, and must not be used when the presence of the
        extension has changed the form or nature of the rest of the packet in
        a way that is not compatible with the way the stream is signalled
        (e.g., as defined by the payload type). Valid examples might include
        metadata that is additional to the usual RTP information.</t>

        <t>The <xref target="RFC6051">RTP rapid synchronisation header
        extension</xref> is recommended, as discussed in <xref
        target="rapid-sync"></xref> we also recommend the <xref
        target="I-D.ietf-avtext-client-to-mixer-audio-level">client to mixer
        audio level</xref>, and consider the <xref
        target="I-D.ietf-avtext-mixer-to-client-audio-level">mixer to client
        audio level</xref> as optional feature.</t>

        <t>Currently the other header extensions are not recommended to be
        included at this time. But we do include a list of the available ones
        for information below:</t>

        <t><list style="hanging">
            <t hangText="Transmission Time offsets:"><xref
            target="RFC5450"></xref> defines a format for including an RTP
            timestamp offset of the actual transmission time of the RTP packet
            in relation to capture/display timestamp present in the RTP
            header. This can be used to improve jitter determination and
            buffer management.</t>

            <t hangText="Associating Time-Codes with RTP Streams:"><xref
            target="RFC5484"></xref> defines how to associate SMPTE times
            codes with the RTP streams.</t>
          </list></t>
      </section>

      <section anchor="rapid-sync" title="Rapid Synchronisation Extensions">
        <t>Many RTP sessions require synchronisation between audio, video, and
        other content. This synchronisation is performed by receivers, using
        information contained in RTCP SR packets, as described in the <xref
        target="RFC3550">RTP specification</xref>. This basic mechanism can be
        slow, however, so it is RECOMMENDED that the rapid RTP synchronisation
        extensions described in <xref target="RFC6051"></xref> be implemented.
        The rapid synchronisation extensions use the general RTP header
        extension mechanism <xref target="RFC5285"></xref>, which requires
        signalling, but are otherwise backwards compatible.</t>
      </section>

      <section title="Client to Mixer Audio Level">
        <t>The <xref
        target="I-D.ietf-avtext-client-to-mixer-audio-level">Client to Mixer
        Audio Level</xref> is an RTP header extension used by a client to
        inform a mixer about the level of audio activity in the packet the
        header is attached to. This enables a central node to make mixing or
        selection decisions without decoding or detailed inspection of the
        payload. Thus reducing the needed complexity in some types of central
        RTP nodes.</t>

        <t>Assuming that the <xref
        target="I-D.ietf-avtext-client-to-mixer-audio-level">Client to Mixer
        Audio Level</xref> is published as a finished specification prior to
        RTCWEB's first RTP specification then it is RECOMMENDED that this
        extension is included.</t>
      </section>

      <section title="Mixer to Client Audio Level">
        <t>The <xref
        target="I-D.ietf-avtext-mixer-to-client-audio-level">Mixer to Client
        Audio Level header extension</xref> provides the client with the audio
        level of the different sources mixed into a common mix from the RTP
        mixer. Thus enabling a user interface to indicate the relative
        activity level of a session participant, rather than just being
        included or not based on the CSRC field. This is a pure optimisations
        of non critical functions and thus optional functionality.</t>

        <t>Assuming that the <xref
        target="I-D.ietf-avtext-client-to-mixer-audio-level">Mixer to Client
        Audio Level</xref> is published as a finished specification prior to
        RTCWEB's first RTP specification then it is OPTIONAL that this
        extension is included.</t>
      </section>
    </section>

    <section title="Improving RTP Transport Robustness">
      <t>There are some tools that can make RTP flows robust against Packet
      loss and reduce the impact on media quality. However they all add extra
      bits compared to a non-robust stream. These extra bits needs to be
      considered and the aggregate bit-rate needs to be rate controlled. Thus
      improving robustness might require a lower base encoding quality but has
      the potential to give that quality with fewer errors in it.</t>

      <section title="RTP Retransmission">
        <t>Support for RTP retransmission as defined by <xref
        target="RFC4588">"RTP Retransmission Payload Format"</xref> is
        RECOMMENDED.</t>

        <t>The retransmission scheme in RTP allows flexible application of
        retransmissions. Only selected missing packets can be requested by the
        receiver. It also allows for the sender to prioritise between missing
        packets based on senders knowledge about their content. Compared to
        TCP, RTP retransmission also allows one to give up on a packet that
        despite retransmission(s) still has not been received within a time
        window.</t>

        <t><xref target="I-D.cbran-rtcweb-data">"RTC-Web Media Transport
        Requirements"</xref> raises two issues that they think makes RTP
        Retransmission unsuitable for RTCWEB. We here consider these issues
        and explain why they are in fact not a reason to exclude RTP
        retransmission from the tool box available to RTCWEB media
        sessions.</t>

        <t><list style="hanging">
            <t
            hangText="The additional latency added by [RFC4588] will exceed the latency threshold for interactive voice and video:">RTP
            Retransmission will require at least one round trip time for a
            retransmission request and repair packet to arrive. Thus the
            general suitability of using retransmissions will depend on the
            actual network path latency between the end-points. In many of the
            actual usages the latency between two end-points will be low
            enough for RTP retransmission to be effective. Interactive
            communication with end-to-end delays of 400 ms still provide a
            fair quality. Even removing half of that in end-point delays
            allows functional retransmission between end-points on the
            continent. In addition in some applications one may accept
            temporary delay spikes to allow for retransmission of crucial
            codec information such an parameter sets, intra picture etc,
            rather than getting no media at all.</t>

            <t
            hangText="The undesirable increase in packet transmission at the point when congestion occurs:">Congestion
            loss will impact the rate controls view of available bit-rate for
            transmission. When using retransmission one will have to
            prioritise between performing retransmissions and the quality one
            can achieve with ones adaptable codecs. In many use cases one
            prefer error free or low rates of error with reduced base quality
            over high degrees of error at a higher base quality.</t>
          </list>The RTCWEB end-point implementations will need to both select
        when to enable RTP retransmissions based on API settings and
        measurements of the actual round trip time. In addition for each NACK
        request that a media sender receives it will need to make a
        prioritisation based on the importance of the requested media, the
        probability that the packet will reach the receiver in time for being
        usable, the consumption of available bit-rate and the impact of the
        media quality for new encodings.</t>

        <t>To conclude, the issues raised are implementation concerns that an
        implementation needs to take into consideration, they are not
        arguments against including a highly versatile and efficient packet
        loss repair mechanism.</t>
      </section>

      <section title="Forward Error Correction (FEC)">
        <t>Support of some type of FEC to combat the effects of packet loss is
        beneficial, but is heavily application dependent. However, some FEC
        mechanisms are encumbered.</t>

        <t>The main benefit from FEC is the relatively low additional delay
        needed to protect against packet losses. The transmission of any
        repair packets should preferably be done with a time delay that is
        just larger than any loss events normally encountered. That way the
        repair packet isn't also lost in the same event as the source
        data.</t>

        <t>The amount of repair packets needed are also highly dynamically and
        depends on two main factors, the amount and pattern of lost packets to
        be recovered and the mechanism one use to derive repair data. The
        later choice also effects the the additional delay required to both
        encode the repair packets and in the receiver to be able to recover
        the lost packet(s).</t>

        <t></t>

        <section title="Basic Redundancy">
          <t>The method for providing basic redundancy is to simply retransmit
          an some time earlier sent packet. This is relatively simple in
          theory, i.e. one saves any outgoing source (original) packet in a
          buffer marked with a timestamp of actual transmission, some X ms
          later one transmit this packet again. Where X is selected to be
          longer than the common loss events. Thus any loss events shorter
          than X can be recovered assuming that one doesn't get an another
          loss event before all the packets lost in the first event has been
          received.</t>

          <t>The downside of basic redundancy is the overhead. To provide each
          packet with once chance of recovery, then the transmission rate
          increases with 100% as one needs to send each packet twice. It is
          possible to only redundantly send really important packets thus
          reducing the overhead below 100% for some other trade-off is
          overhead.</t>

          <t>In addition the basic retransmission of the same packet using the
          same SSRC in the same RTP session is not possible in RTP context.
          The reason is that one would then destroy the RTCP reporting if one
          sends the same packet twice with the same sequence number. Thus one
          needs more elaborate mechanisms.</t>

          <t><list style="hanging">
              <t hangText="RTP Payload for Redundant Audio Data:">This audio
              and text redundancy format defined in <xref
              target="RFC2198"></xref> allows for multiple levels of
              redundancy with different delay in their transmissions, as long
              as the source plus payload parts to be redundantly transmitted
              together fits into one MTU. This should work fine for most
              interactive use cases as both the codec bit-rates and the
              framing intervals normally allow for this requirement to hold.
              This payload format also don't increase the packet rate, as
              original data and redundant data are sent together. This format
              does not allow perfect recovery, only recovery of information
              deemed necessary for audio, for example the sequence number of
              the original data is lost.</t>

              <t hangText="RTP Retransmission Format:">The <xref
              target="RFC4588">RTP Retransmission Payload format</xref> can be
              used to pro-actively send redundant packets using either SSRC or
              session multiplexing. By using different SSRCs or a different
              session for the redundant packets the RTCP receiver reports will
              be correct. The retransmission payload format is used to recover
              the packets original data thus enabling a perfect recovery.</t>

              <t
              hangText="Duplication Grouping Semantics in the Session Description Protocol:"><xref
              target="I-D.begen-mmusic-redundancy-grouping">This</xref> is
              proposal for new SDP signalling to indicate media stream
              duplication using different RTP sessions, or different SSRCs to
              separate the source and the redundant copy of the stream.</t>
            </list></t>
        </section>

        <section title="Block Based">
          <t>Block based redundancy collects a number of source packets into a
          data block for processing. The processing results in some number of
          repair packets that is then transmitted to the other end allowing
          the receiver to attempt to recover some number of lost packets in
          the block. The benefit of block based approaches is the overhead
          which can be lower than 100% and still recover one or more lost
          source packet from the block. The optimal block codes allows for
          each received repair packet to repair a single loss within the
          block. Thus 3 repair packets that are received should allow for any
          set of 3 packets within the block to be recovered. In reality one
          commonly don't reach this level of performance for any block sizes
          and number of repair packets, and taking the computational
          complexity into account there are even more trade-offs to make among
          the codes.</t>

          <t>One result of the block based approach is the extra delay, as one
          needs to collect enough data together before being able to calculate
          the repair packets. In addition sufficient amount of the block needs
          to be received prior to recovery. Thus additional delay are added on
          both sending and receiving side to ensure possibility to recover any
          packet within the block.</t>

          <t>The redundancy overhead and the transmission pattern of source
          and repair data can be altered from block to block, thus allowing a
          adaptive process adjusting to meet the actual amount of loss seen on
          the network path and reported in RTCP.</t>

          <t>The alternatives that exist for block based FEC with RTP are the
          following:</t>

          <t><list style="hanging">
              <t
              hangText="RTP Payload Format for Generic Forward Error Correction:"><xref
              target="RFC5109">This RTP payload format</xref> defines an XOR
              based recovery packet. This is the simplest processing wise that
              an block based FEC scheme can be. It also results in some
              limited properties, as each repair packet can only repair a
              single loss. To handle multiple close losses a scheme of
              hierarchical encodings are need. Thus increasing the overhead
              significantly.</t>

              <t hangText="Forward Error Correction (FEC) Framework:"><xref
              target="I-D.ietf-fecframe-framework">This framework</xref>
              defines how not only RTP packets but how arbitrary packet flows
              can be protected. Some solutions produced or under development
              in FECFRAME WG are RTP specific. There exist alternatives
              supporting block codes such as Reed-Salomon and Raptor.</t>
            </list></t>
        </section>

        <section title="Recommendations for FEC">
          <t>(tbd)</t>
        </section>
      </section>
    </section>

    <section title="RTP Rate Control and Media Adaptation">
      <t>It is REQUIRED to have an RTP Rate Control mechanism using Media
      adaptation to ensure that the generated RTP flows are network friendly,
      and maintain the user experience in the presence of network
      problems.</t>

      <t>The biggest issue is that there are no standardised and ready to use
      mechanism that can simply be included in RTC-Web. Thus there will be
      need for the IETF to produce such a specification. A potential starting
      point for defining a solution is <xref target="rtp-tfrc">"RTP with TCP
      Friendly Rate Control"</xref>.</t>
    </section>

    <section title="RTP Performance Monitoring">
      <t>RTCP does contains a basic set of RTP flow monitoring points like
      packet loss and jitter. There exist a number of extensions that could be
      included in the set to be supported. However, in most cases which RTP
      monitoring that is needed depends on the application, which makes it
      difficult to select which to include when the set of applications is
      very large.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo makes no request of IANA.</t>

      <t>Note to RFC Editor: this section may be removed on publication as an
      RFC.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>RTP and its various extensions each have their own security
      considerations. These should be taken into account when considering the
      security properties of the complete suite. We currently don't think this
      suite creates any additional security issues or properties. The use of
      SRTP will provide protection or mitigation against all the fundamental
      issues by offering confidentiality, integrity and partial source
      authentication. We don't discuss the key-management aspect of SRTP in
      this memo, that needs to be done taking the RTC-Web communication model
      into account.</t>

      <t>In the context of RTC-Web the actual security properties required
      from RTP are currently not fully understood. Until security goals and
      requirements are specified it will be difficult to determine what
      security features in addition to SRTP and a suitable key-management, if
      any, that are needed.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t></t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.3550"?>

      <?rfc include='reference.RFC.2736'?>

      <?rfc include='reference.RFC.3551'?>

      <?rfc include='reference.RFC.3556'?>

      <?rfc include='reference.RFC.3711'?>

      <?rfc include='reference.RFC.4585'?>

      <?rfc include='reference.RFC.4588'?>

      <?rfc include='reference.RFC.4961'?>

      <?rfc include='reference.RFC.5104'?>

      <?rfc include='reference.RFC.5109'?>

      <?rfc include='reference.RFC.5124'?>

      <?rfc include='reference.RFC.5285'?>

      <?rfc include='reference.RFC.5450'?>

      <?rfc include='reference.RFC.5484'?>

      <?rfc include='reference.RFC.5506'?>

      <?rfc include='reference.RFC.5761'?>

      <?rfc include='reference.RFC.6051'?>

      <?rfc include='reference.RFC.6222'?>

      <?rfc include='reference.I-D.ietf-avtext-multiple-clock-rates'?>

      <?rfc include='reference.I-D.ietf-avtext-mixer-to-client-audio-level'?>

      <?rfc include='reference.I-D.ietf-avtext-client-to-mixer-audio-level'?>
    </references>

    <references title="Informative References">
      <reference anchor="rtp-tfrc">
        <front>
          <title>RTP with TCP Friendly Rate Control
          (draft-gharai-avtcore-rtp-tfrc-00)</title>

          <author fullname="Ladan Gharai" initials="L." surname="Gharai">
            <organization></organization>
          </author>

          <date day="7" month="March" year="2011" />
        </front>
      </reference>

      <?rfc include='reference.RFC.2198'?>

      <?rfc include='reference.RFC.2616'?>

      <?rfc include='reference.RFC.2733'?>

      <?rfc include='reference.RFC.5117'?>

      <?rfc include='reference.RFC.5245'?>

      <?rfc include='reference.RFC.5389'?>

      <?rfc include='reference.RFC.5968'?>

      <?rfc include='reference.I-D.ietf-hybi-thewebsocketprotocol'?>

      <?rfc include='reference.I-D.rosenberg-rtcweb-rtpmux'?>

      <?rfc include='reference.I-D.cbran-rtcweb-data'?>

      <?rfc include='reference.I-D.westerlund-avtcore-multistream-and-simulcast'?>

      <?rfc include='reference.I-D.begen-mmusic-redundancy-grouping'?>

      <?rfc include='reference.I-D.ietf-fecframe-framework'?>
    </references>
  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-24 01:06:40