One document matched: draft-gunther-detnet-proaudio-req-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3552 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3552.xml">
<!ENTITY I-D.narten-iana-considerations-rfc2434bis SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.narten-iana-considerations-rfc2434bis.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/'ing/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-gunther-detnet-proaudio-req-00" ipr="trust200902">
  <!-- category values: std, bcp, info, exp, and historic
     ipr values: full3667, noModification3667, noDerivatives3667
     you can add the attributes updates="NNNN" and obsoletes="NNNN" 
     they will automatically be output with "(if approved)" -->

  <!-- ***** FRONT MATTER ***** -->

  <front>
    <title abbrev="DetNet Pro Audio requirements">
    Deterministic Networking Professional Audio Requirements</title>

    <author fullname="Craig Gunther" initials="C.A.G." role="editor"
            surname="Gunther">
      <organization abbrev="HARMAN">Harman International</organization>

      <address>
        <postal>
          <street>10653 South River Front Parkway</street>
          <city>South Jordan</city><region>UT</region>
          <code>84095</code>
          <country>USA</country>
        </postal>

        <phone>+1 801 568-7675</phone>
        <email>craig.gunther@harman.com</email>
        <uri>http://www.harman.com</uri>
      </address>
    </author>

    <date month="March" year="2015"/>

    <area>Internet</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <keyword>DetNet</keyword>
    <keyword>AVB</keyword>
    <keyword>TSN</keyword>
    <keyword>SRP</keyword>
    <abstract>
      <t>This draft documents the needs in the Professional Audio industry to
      establish multi-hop paths and optional redundant paths for
      characterized flows with deterministic properties.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>Professional Audio (Pro-A) includes the simple and small network used
      by a garage band which may contain a handful of devices, as well as the
      large theme park spread across 25,000 acres or more. It is worth noting
      that these theme parks may exist on multiple continents and share content
      around the world.</t>

      <t>Some examples of Pro-A networks include:

        <list style="symbols">
          <t>Garage bands</t>
          <t>Portable PA</t>
          <t>Churches</t>
          <t>Concert halls</t>
          <t>Recording and broadcasting studios</t>
          <t>Cinema and theater sound</t>
          <t>Train stations</t>
          <t>Stadiums</t>
          <t>Airports</t>
        </list>
      </t>

      <t>While many of these uses have common requirements there are some
      unique usage models that will be highlighted in this document.</t>
    </section>

    <section title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </section>

    <section title="Stream Characteristics">
      <t>All streams of interest to the Pro-A world have the same requirements
      related to establishing a path and allocating bandwidth as any other type
      of network application. This section of the draft is meant to introduce
      other concerns associated with streams in a Pro-A network.</t>
      
      <section title="Emergency Notifications">
      <t>Audio systems installed in public environments have unique requirements
      with regards to health, safety and fire concerns. For example
      <xref target="ISO7240-16"/> subjects equipment
      to tests that can simulate an emergency situation. The purpose of this
      section is to provide a very basic set of requirements that an underlying
      network must provide if it is to be used in public areas. It would be
      advantageous to establish a liaison with the International Standards
      Organization (ISO) so that the referenced ISO 7240 standards could be
      made available for Deterministic Networking (DetNet) review for the
      specific details.</t>
      <t>The remainder of this section is simply a synopsis of some of the
      requirements found in the ISO 7240 standard. The wording in that standard
      supersedes anything specified in this section and it should be
      referenced for the specific requirements.</t>
      <t>Any numbers in this section surrounded by braces
      refers to the specific section within ISO 7240-16:2007 (for example
      {7.1.1} is a reference to section 7.1.1).</t>
      <t>One such requirement is a maximum of 3 seconds {7.1.1} for a system to
      respond to an emergency detection and begin sending appropriate warning
      signals and alarms. When these conditions occur
      the audio system must be able to disable normal functions {7.1.4} not
      associated with emergency functionality, without the need for human
      intervention.</t>
      <t>Announcements must be able to be made within 20 seconds of a system
      reset {7.9.2.2}.</t>
      <t>In the event of equipment failure the backup equipment must be able to
      take over within 10 seconds {14.4.1}. This would include detection time,
      new path configuration, etc.</t>
      </section>

      <section title="Content Protection">
      <t>Digital Rights Management (DRM) is very important to the Pro-A and
      Professional Video industries. Any time that protected content is
      introduced into a network there are DRM concerns that must be
      maintained. (See <xref target="CONTENT_PROTECTION"/>).</t>
      <t>As an example, two techniques are Digital Transmission Content
      Protection (DTCP) and High-Bandwidth Digital Content Protection (HDCP).
      HDCP content is not approved for retransmission within any other type of
      DRM, while DTCP may be retransmitted under HDCP. Therefore if the source
      of a stream is outside of the network and it uses HDCP protection it is
      only allowed to be placed on the network with that same HDCP protection.</t>
      </section>
      
      <section title="Multiple Sinks">
      <t>Pro Audio systems often have multiple sinks (e.g.: speakers) connected
      to a single source. In order to keep bandwidth utilization of shared links
      to a minimum multicast addressing is commonly used.</t>
      </section>
      
      <section title="Super Stream = Two or More Serial Streams">
      <t>Audio content delivered from a source (e.g.: microphone or guitar)
      can be sent through one or more stages of processing before it reaches
      the sink(s). For example, one stream may be used to send audio from a
      microphone hub to a digital processor that will match the singers pitch
      to that of a guitar. A second stream will then take that processed audio
      to a mixing console. A third stream is then required to move the mixed
      audio to an amplified speaker. Not only does this one super "stream"
      require three physical streams to be created, but the overall latency of
      all three streams plus the digital processing at each hop must not exceed
      10-15 msec. See slide 6 of <xref target="SRP_LATENCY"/>.</t>
      </section>

      <section title="Unused Reservations and Best-Effort Traffic">
      <t>Often times reservations are created, but not used until some time
      later in a live show. This is really more of a comfort issue for the
      show's producers; they just want to know that there is no reason an
      important reservation's request could be refused during a live
      performance.</t>
      <t>In other situations a single reservation may be used for different
      content at different times throughout the day. It is convenient to create
      a single reservation that is large enough for the biggest bandwidth
      consumer although that could be wasteful on smaller streams.</t>
      <t>In both these cases it is advantageous for other best-effort traffic to
      be able to use that unused bandwidth so that the full bandwidth of the
      network can be utilized at all times. This best-effort traffic could
      consist of "meter data" which helps an operator understand what is going
      on at the other end of Pro-A system in an amusement park. Or it could be
      used for file transfers or venue updates. Regardless of the reason, Pro-A
      installations will want to be able to use any reserved bandwidth that is
      unused.</t>
      </section>
      
      <section title="Maximum and Acceptable Latency">
      <t>In order to synchronize speakers throughout a venue it is critical for
      each sink (amplified speaker) to know what the maximum latency is it can
      expect to see from the network. That maximum latency from each sink is
      sent back to the source, or an associated Controller, so the presentation
      time of the Pro-A audio data samples can be set. In addition, sinks that
      are fewer hops away from the source will know how much memory they will
      need to provide in order to buffer the content that will be presented at
      some later time.</t>
      <t>A Controller may also collect the various maximum latency numbers and
      decide to exclude the sinks that are too many hops away since they will
      place unrealistic buffering requirements on the sinks that are very few
      hops from the source.</t>
      <t>Additionally, sinks that are closer to the source can inform the
      network that they can accept more latency than the network is currently
      offering since they will be buffering packets to match play-out time of
      father away sinks. This acceptable latency can be used by the network to
      move a reservation on a short path to a longer path in order to free up
      bandwidth for other critical streams on that short path. See slides 3-5
      of <xref target="SRP_LATENCY"/>.</t>
      </section>
      
      <section title="Latency Per Sink">
      <t>As previously mentioned a single stream may be sent to multiple sinks.
      This use case introduces the concept of more stringent latency
      requirements for some sinks, whereas other sinks have more flexible
      latency requirements. A live outdoor concert has stringent requirements
      for delivering the audio to the speaker systems, yet can have very
      flexible requirements for that same audio content that is delivered to a
      mobile recording studio that is set up nearby. See slide 7 of
      <xref target="SRP_LATENCY"/>.
      </t>
      </section>
      
      <section title="Layer 3 Interconnecting Layer 2 Islands">
      <t>The DetNet solution for Layer 3 networks should support Layer 3
      segments that can connect to Layer 2 networks that do not support Layer 3
      protocols.</t>
      </section>
      
      <section title="Link Aggregation">
      <t>If any type of link aggregation is proposed as part of the DetNet
      solution there must be a technique used that can determine the maximum
      latency that a packet may experience when flowing across any links in that
      aggregation.</t>
      <t>Or, an alternative could be to report the maximum latency of a single
      link within the link aggregation and then enforce that the stream will
      only use that link when establishing the path.</t>
      </section>

      <section title="Layer 3 Multicast">
      <t>Because of the MAC Address forwarding nature of Layer 2 bridges it is
      important that a multicast MAC Address is only associated with one stream.
      This will prevent reservations from forwarding packets from one stream
      down a path that has no interested sinks simply because there is another
      stream on that same path that shares the same multicast MAC address.</t>
      <t>Since each multicast MAC Address can represent 32 different IPv4
      multicast addresses there must be a process put in place to make sure
      this does not occur. Optionally it could be stated that Deterministic
      Networking will recommend the use of IPv6, although the impact of such a
      decision upon existing IPv4 installations should be discussed.</t>
      </section>
      
      <section title="Segregate Traffic">
      <t>Sink devices may have limited processing power. In order to not
      overwhelm the CPUs in these devices it is important to limit the amount
      of traffic that these devices must process. Packet forwarding rules
      should eliminate extraneous streaming traffic from reaching these devices;
      however there may be other types of broadcast traffic that should be
      eliminated where possible. This is often done by VLANs or IP subnets.
      </t>
      </section>

      <section title="Elapsed Time to Build a Reservation">
      <t>During a venue change in a show various modifications to reservations
      may be required. Some existing reservation may be torn down and other
      reservations may be established. On the Pro-A side this may be a simple
      reconfiguration of the speakers so the sound field can be created in a
      different way, or inclusion or exclusion of certain areas in the physical
      environment.</t>
      <t>When video is added to the mix this may be switching from one camera to
      another. Currently video systems use expensive switching hardware to
      switch inputs at the head-end of the final feed. Interest has been
      expressed from the Broadcast industry to the IEEE AVB group for using the
      network as the video switch (see <xref target="STUDIO_IP"/>).</t>
      <t>There is also the issue of the time between power-on and establishment
      of the first set of reservations. In many situations the appropriate thing
      to do is simply reestablish all paths and bandwidth reservations as were
      in place when the power was turned off, doing this as quickly as possible.
      This is particularly true when recovering from a power failure, or
      accidental removal of an Ethernet cable or power cord.</t>
      </section>
    </section>
    
    <section title="Use Cases">
      <section title="Singularity of IT and AV Networks">
      <t>A recent large installation of a Pro-A network based on IEEE 802.1 AVB
      technology encompassed a 194,000 sq ft, $125 million facility. The network
      is capable of handling 46 Tbps of throughput with 60,000 simultaneous
      signals. Inside the facility are 1,100 miles of fiber feeding four audio
      control rooms. Phase I of this project was for audio, the next phase will
      include video as well. One of the future goals of this project is to have
      the capability to integrate IT infrastructure with the audio streaming
      technology. Details of this installation can be found here
      <xref target="ESPN_DC2"/>.</t>
      </section>
      
      <section title="Combining Local and Remote Content">
      <t>One advantage of a guaranteed reservation with a small bounded latency
      is the reduced buffering requirements on sink devices. As mentioned
      earlier there are large theme parks, megachurches, and other venues that
      wish to broadcast a live event from one physical location to another
      physical location. These may be across town or across the globe and the
      content would be delivered via a layer 3 protocol. Depending on the
      technology available, latency bounds and jitter caused by Internet
      delivery of content can have a huge impact on the buffering requirements
      at the receiving site.</t>
      <t>In these situations it is acceptable at the local location for content
      from the live remote site to be delayed (buffered) a reasonable amount to
      allow for a statistically acceptable amount of latency in order to reduce
      jitter. However, once the content begins playing in the local location any
      audio artifacts caused by the local network are unacceptable, especially
      in those situation where a live local performer is "mixed" into the feed
      from the remote location.</t>
      <t>With these scenarios a single gateway device at the local network that
      is receiving the feed from the remote site would provide the expensive
      buffering required to mask the latency and jitter issues associated with
      long distance delivery. Sink devices in the local location would have no
      additional buffering requirements, and thus no additional costs, beyond
      those required for delivery of local content. The sink device would be
      receiving the identical packets as those sent by the source and would be
      unaware that there were any latency or jitter issues along the path.</t>
      </section>
      
      <section title="Lots of Small Devices">
      <t>Consumers expect more and more from their theater experiences. One
      example is the use of individual theater seat speakers and effects
      systems. In order to be cost effective these systems must be inexpensive
      per seat since the quantities in a single theater can reach hundreds or
      thousands of seats.</t>
      <t>Discovery protocols alone in a one thousand seat theater can generate a
      lot of broadcast traffic that can put an unnecessary load on a low powered
      CPU. An installation like this will require some type of traffic
      segregation that can create groups of seats to reduce traffic within that
      group. All seats in the theater must still be able to communicate with a
      central controller.</t>
      </section>

    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The editor would like to acknowledge the help of the following
      individuals and the companies they represent:</t>
      <t>Jeff Koftinoff, Meyer Sound</t>
      <t>Jouni Korhonen, Associate Technical Director, Broadcom</t>
      <t>Pascal Thubert, CTAO, Cisco</t>
      <t>Kieran Tyrrell, Sienda New Media Technologies GmbH</t>
    </section>

    <!-- Possibly a 'Contributors' section ... -->

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>
    </section>

    <section anchor="Security" title="Security Considerations">

      <section title="Content Protection">
      <t>As mentioned earlier any solutions that would be recommended for the
      Professional A/V space must support DRM.</t>
      </section>

      <section title="Denial of Service">
      <t>Many industries that are moving from the analog wire world to the
      digital network world have little understanding of the pitfalls that they
      can create for themselves by an improperly installed system. DetNet should
      consider ways to provide security against DoS attacks in solutions
      directed at these markets.</t>
      <t>One example this author is aware of involved the use of technology that
      allows a presenter to "throw" the content from their tablet or smart phone
      onto the A/V system that is then viewed by all those in attendance.
      The facility introducing this technology was quite excited to allow such
      modern flexibility to those who came to speak. One thing they hadn't
      realized was that since no security was put in place around this
      technology it left a hole in the system that allowed other attendees to
      "throw" their own content onto the A/V system.</t>
      </section>

      <section title="Control Protocols">
      <t>Pro-A systems can include amplifiers that are capable of generating
      several hundreds or thousands of watts of audio power. If used incorrectly
      these systems can cause hearing damage to those in the vicinity of the
      speaker arrays. The traffic that controls these devices must be protected
      and that is mostly a concern of those providing that service. However, the
      configuration protocols that create the network paths used by the Pro-A
      traffic should be protected as well so that high-volume content cannot
      be sent to areas that are not meant to receive it.</t>
      </section>

    </section>
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>
    <!-- References split into informative and normative -->

    <references title="Normative References">
      <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->
      &RFC2119;
    </references>

    <references title="Informative References">
      <reference anchor="ISO7240-16"
                 target="http://www.iso.org/iso/catalogue_detail.htm?csnumber=42978">
        <front>
          <title>ISO 7240-16:2007 Fire detection and alarm systems -- Part 16:
          Sound system control and indicating equipment</title>

          <author>
            <organization>ISO</organization>
          </author>

          <date year="2007"/>
        </front>
      </reference>

      <reference anchor="CONTENT_PROTECTION"
                 target="http://grouper.ieee.org/groups/1722/contributions/2012/avtp_dolsen_1722a_content_protection.pdf">
        <front>
          <title>1722a Content Protection</title>

          <author initials="D" surname="Olsen">
            <organization>Harman</organization>
          </author>

          <date year="2012"/>
        </front>
      </reference>
      
      <reference anchor="ESPN_DC2"
                 target="http://sportsvideo.org/main/blog/2014/06/espns-dc2-scales-avb-large">
        <front>
          <title>ESPN's DC2 Scales AVB Large</title>

          <author initials="D" surname="Daley">
            <organization>Sports Video Group</organization>
          </author>

          <date year="2014"/>
        </front>
      </reference>

      <reference anchor="SRP_LATENCY"
                 target="http://www.ieee802.org/1/files/public/docs2014/cc-cgunther-acceptable-latency-0314-v01.pdf">
        <front>
          <title>Specifying SRP Latency</title>

          <author initials="C" surname="Gunther">
            <organization>Harman International</organization>
          </author>

          <date year="2014"/>
        </front>
      </reference>

      <reference anchor="STUDIO_IP"
                 target="http://www.ieee802.org/1/files/public/docs2047/avb-mace-ip-networked-studio-infrastructure-0107.pdf">
        <front>
          <title>IP Networked Studio Infrastructure for Synchronized & Real-Time Multimedia Transmissions</title>

          <author initials="G" surname="Mace">
            <organization>CR / CP&M Lab (Rennes / France)</organization>
          </author>

          <date year="2007"/>
        </front>
      </reference>
    </references>

    <!-- Change Log

v00 2015-03-02  CAG   Initial version
     -->
  </back>
</rfc>


PAFTECH AB 2003-20262026-04-23 03:36:11