One document matched: draft-ietf-clue-telepresence-requirements-07.xml


<?xml version="1.0" encoding="US-ASCII"?>

<!DOCTYPE rfc SYSTEM "http://xml.resource.org/authoring/rfc2629.dtd"  [

<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC3261 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3261.xml">
<!ENTITY RFC4353 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4353.xml">
<!ENTITY RFC4579 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4579.xml">
<!ENTITY RFC5117 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5117.xml">
<!ENTITY RFC5239 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5239.xml">
<!ENTITY RFC6503 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6503.xml">

]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no"?>
<?rfc sortrefs="no" ?>

<rfc category="info" docName="draft-ietf-clue-telepresence-requirements-07.txt" ipr="trust200902">

  <front>
     <title abbrev="CLUE Telepresence Requirements"> Requirements for Telepresence Multi-Streams</title>

    <author fullname="Allyn Romanow" initials="A." surname="Romanow">
      
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street></street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>USA</country>
        </postal>

        <email>allyn@cisco.com</email>
      </address>

    </author>    

    <author fullname="Stephen Botzko" initials="S." surname="Botzko">
    
      <organization>Polycom</organization>

      <address>
        <postal>
          <street></street>

          <city>Andover</city>

          <region>MA</region>

          <code>01810</code>

          <country>US</country>
        </postal>

        <email>stephen.botzko@polycom.com</email>
      </address>
    </author>   
    
    <author fullname="Mary Barnes" initials="M." surname="Barnes">
    
      <organization>Polycom</organization>
      <address>
        <email>mary.ietf.barnes@gmail.com</email>
      </address>
    </author>        
         
    <date />
    <area>RAI</area>

    <workgroup>CLUE WG</workgroup>


    <abstract>
      <t>This memo discusses the requirements for specifications, that enable
      telepresence interoperability by describing behaviors and protocols for 
      Controlling Multiple Streams for Telepresence (CLUE).  
      In addition, the problem statement
      and related definitions are also covered herein.  
        </t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
<t>
Telepresence systems greatly improve collaboration. In a telepresence
   conference (as used herein), the goal is 
   to create an environment that gives the users a feeling of 
   (co-located) presence - the feeling that a local user is in the
   same room with other local users and the remote parties. Currently,
   systems from different vendors often do not interoperate 
   because they do the same tasks differently, as discussed in the
   Problem Statement section below. 
</t>
<t>
The approach taken in this memo is to set requirements for a future
   specification(s) that, when fulfilled by an implementation of the
   specification(s), provide for  interoperability between IETF protocol
   based telepresence systems.  It is anticipated that a solution for
   the requirements set out in this memo likely involves the exchange
   of adequate information about participating sites; information that
   is currently not standardized by the IETF.  
</t>
<t>
 The purpose of this document is to describe the requirements for a
   specification that enables interworking between different SIP-based
   <xref target="RFC3261"/> telepresence systems, by exchanging and
   negotiating appropriate information. 
   In the context of the requirements in this document and 
   related solution documents,  this includes both point to point SIP sessions 
   as well as SIP based 
   conferences as described in the
   SIP conferencing framework <xref target="RFC4353"/> and 
   the SIP based conference control <xref target="RFC4579"/> specifications.    
   Non IETF protocol based
   systems, such as those based on ITU-T Rec. H.323, are out of scope.
   These requirements are for the specification, they are not
   requirements on the telepresence systems implementing the
   solution/protocol that will be specified.
</t>
<t>
Telepresence systems of different vendors, today, can follow radically
different architectural approaches while offering a similar user
experience.  CLUE will not dictate telepresence
architectural and implementation choices; however it will describe
a protocol architecture for CLUE and how it relates to other 
protocols.   CLUE enables
interoperability between telepresence systems by exchanging
information about the systems' characteristics.  Systems can use this
information to control their behavior to allow for interoperability
between those systems.
</t>
<t>
A telepresence session requires at least one sending and one
receiving endpoint.   Multiparty
telepresence sessions include more than two endpoints, and centralized
infrastructure such as Multipoint Control Units (MCUs) or equivalent.
CLUE specifies the syntax, semantics, and control flow of information
to enable the best possible user experience at those endpoints.  
</t>
<t>
Sending endpoints, or MCUs, are not mandated to use any of the CLUE
specifications that describe their capabilities, attributes, or
behavior.  Similarly, it is not envisioned that endpoints or MCUs must
ever take into account information received.  However, by making
available as much information as possible, and by taking into account
as much information as has been received or exchanged, MCUs and
endpoints are expected to select operation modes that enable the best
possible user experience under their constraints.
</t>
<t>
The document structure is as follows: Definitions are set out,
followed by a description of the problem of telepresence
interoperability that led to this work.  Then the requirements to a
specification addressing the current shortcomings are enumerated 
and discussed. 
</t>
</section>

<section anchor="terms" title="Terminology">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
 
</section>


<section anchor="Definitions" title="Definitions">
<t> 
The following terms are used throughout this document and serve as reference for other documents. 
</t>
<list style='hanging'hangIndent=''>
<t> 
Audio Mixing: refers to the accumulation of scaled audio signals to produce a single
audio stream. See RTP Topologies, <xref target="RFC5117"/>.  
</t>

<t> 
Conference: used as defined in <xref target="RFC4353"/>, A Framework
for Conferencing within the Session Initiation Protocol (SIP).
</t>

<t>         
Endpoint: The logical point of final termination through
receiving, decoding and rendering, and/or initiation through
capturing, encoding, and sending of media streams.  An endpoint
consists of one or more physical devices which source and sink
media streams, and exactly one <xref target="RFC4353"/> Participant
(which, in turn, includes exactly one SIP User Agent).
In contrast to an endpoint, an MCU may also send and receive media
streams, but it is 
not the initiator nor the final terminator in the sense that Media
is Captured or Rendered.  Endpoints can be
anything from multiscreen/multicamera rooms to handheld devices.
</t>

<t>
Endpoint Characteristics: include placement of Capture and Rendering 
Devices, capture/render angle, resolution of cameras and screens, spatial 
location and mixing parameters of microphones.  Endpoint characteristics 
are not specific to individual media streams sent by the endpoint.
</t>
<t>         
Layout: How rendered media streams are spatially arranged with respect
to each other on a single screen/mono audio telepresence
endpoint, and how rendered media streams are arranged with respect to
each other on a multiple screen/speaker telepresence endpoint.
Note that audio as well as video is encompassed by the term
layout--in other words, included is the placement of audio
streams on speakers as well as video streams on video screens.
</t>

<t>
Local: Sender and/or receiver physically co-located ("local") in the
context of the discussion. 
</t>

<t>
MCU: Multipoint Control Unit (MCU) - a device that
connects two or more endpoints together into one single
multimedia conference <xref target="RFC5117"/>.  An MCU may include
a Mixer  <xref target="RFC4353"/>.
</t>

<t> 
Media: Any data that, after suitable encoding, can be conveyed
over RTP, including audio, video or timed text.
</t>

<t>
Model: a set of assumptions a telepresence system of a given vendor 
adheres to and expects the remote telepresence system(s) also to adhere to. 
</t>

<t> 
Remote: Sender and/or receiver on the other side of the communication
channel (depending on context); not Local. A remote can be an Endpoint
or an MCU.           
</t>

<t> 
Render: the process of generating a representation from a media, such 
as displayed motion video or sound emitted from loudspeakers.
</t>

<t>
Telepresence: an environment that gives non co-located users or 
user groups a feeling of (co-located) presence - the feeling that a 
Local user is in the same room with other Local users and the Remote 
parties.  The inclusion of Remote parties is achieved through 
multimedia communication including at least audio and video signals of 
high fidelity.
</t>

</list>


</section>
 

<section anchor="problem statement" title="Problem Statement">
<t>
   In order to create a "being there" experience characteristic of
   telepresence, media inputs need to be transported, received, and
   coordinated between participating systems.  Different telepresence
   systems take diverse approaches in crafting a solution, or, they
   implement similar solutions quite differently.
</t>
<t>
   They use disparate techniques, and they describe,
   control and negotiate media in dissimilar fashions.  Such diversity
   creates an interoperability problem.  The same issues are solved in
   different ways by different systems, so that they are not directly
   interoperable.  This makes interworking difficult at best and
   sometimes impossible.</t>

<t>Worse, many telepresence systems use proprietary protocol extensions to
solve telepresence-related problems, even if those extensions are
based on common standards such as SIP.
</t>
<t>
   Some degree of interworking between systems from different vendors
   is possible through transcoding and translation.  This requires
   additional devices, which are expensive, often not entirely
   automatic, and they sometimes introduce unwelcome side effects,
   such as additional delay or degraded performance.  Specialized
   knowledge is currently required to operate a telepresence
   conference with endpoints from different vendors, for example to
   configure transcoding and translating devices.  Often such
   conferences do not start as planned, or are interrupted by
   difficulties that arise.</t>
<t>
   The general problem that needs to be solved can be described as
   follows.  Today, each endpoint sends audio and video captures based
   upon an implicitly assumed model for rendering a realistic
   depiction based on this information.  If all endpoints are
   manufactured by the same vendor, they work with the same model and
   render the information according to the model implicitly assumed by
   the vendor.  However, if the devices are from different vendors,
   the models they each use for rendering presence can and usually do
   differ.  The result can be that the telepresence systems actually
   connect, but the user experience suffers, for example because one
   system assumes that the first video stream is captured from the
   right camera, whereas the other assumes the first video stream is
   captured from the left camera.</t>
<t>
   If Alice and Bob are at different sites,  Alice needs to
   tell Bob about the camera and sound equipment arrangement 
   at her site so that Bob's receiver can create an accurate rendering
   of her site.
   Alice and Bob
   need to agree on what the salient characteristics are as well as
   how to represent and communicate them. Characteristics may include
   number, placement, capture/render angle, resolution of cameras and
   screens, spatial location and audio mixing parameters of microphones.  
</t>
<t>
   The telepresence multi-stream work seeks to describe the sender situation in a way
   that allows the receiver to render it realistically even though it may
   have a different rendering model than the sender.
</t>
</section>


<section anchor="Requirements" title="Requirements">

<t> 
Although some aspects of these requirements can be met by existing
technology, such as SDP,  they are stated here to
have a complete record of what the requirements for CLUE are,
whether new work is needed or they can be met by
existing technology.   Figuring this out will be part of the solution
development, rather than part of the requirements. 
Note, the term "solution" is used in these requirements to mean the 
protocol specifications, including extensions to existing protocols 
as well as any new protocols, developed to support the use cases.
The solution can
introduce additional functionality that isn't mapped directly to these
requirements - e.g., the detailed information carried in the signaling
protocol(s).   
In cases 
where the requirements are directly related to a specific use case, 
a reference to the use case is provided. 
</t>

<list style='format REQMT-%d:'>
<t>  
The solution MUST support a description of the spatial
arrangement of source video images sent in video streams which enables
a satisfactory reproduction at the receiver of the original
scene. This applies to each site in a point to point or a multipoint
meeting and refers to the spatial ordering within a site, not to the
ordering of images between sites. 
<vspace blankLines='1' /> 
Use case point to point symmetric, and all other use cases.

<list style='format REQMT-1%c:'>
<t>
The solution MUST support a means of allowing the preservation of the order of
images in the captured scene. For example, if John is to Susan's right in the image
capture, John is also to Susan's right in the rendered image. 
</t>
<t>
The solution MUST support a means of allowing the preservation of
order of images in the scene in two dimensions - horizontal and vertical.
</t>
<t>
The solution MUST support a means to identify the point of capture of
individual video captures in three dimensions.
</t>
<t>
The solution MUST support a means to identify the area of coverage of individual
video captures in three dimensions.
</t>
</list>
</t>

<t>
The solution MUST support a description of the spatial
arrangement of captured source audio sent in audio streams which
enables a satisfactory reproduction at the receiver in a spatially
correct manner.   This applies to each site in a point to point or a
multipoint meeting and refers to the spatial ordering within a site,
not the ordering of channels between sites. 
<vspace blankLines='1' />
Use case point to point symmetric, and all use cases, 
especially heterogeneous.


<list style='format REQMT-2%c:'>

<t>
The solution MUST support a means of preserving the spatial order of audio in
the captured scene. For example, if John sounds as if he is at Susan's
right in the captured audio, John voice is also placed at Susan's
right in the rendered image.  
</t>
<t>
The solution MUST support a means to identify the number and spatial arrangement of audio channels including monaural, stereophonic
(2.0), and 3.0 (left, center, right) audio channels. 
</t>
<t>
The solution MUST support a means to identify the point of capture of
individual audio captures in three dimensions. 
</t><t>
The solution MUST support a means to identify the area of coverage of individual
audio captures in three dimensions. 
</t>
</list>
</t>

<t>
The solution MUST enable individual audio streams to be associated
with one or more video image captures, and individual video image
captures to be associated with one or more audio captures, for the
purpose of rendering proper position.
<vspace blankLines='1' />
Use case is point to point symmetric, and all use cases.
</t>

<t>
The solution MUST enable interoperability between endpoints
that have a different number of similar devices. For example, one
endpoint may have 1 screen, 1 speaker, 1 camera, 1 mic, and another
endpoint may have 3 screens, 2 speakers, 3 cameras and 2 microphones. Or, in
a multi-point conference, one endpoint may have one screen, another
may have 2 screens and a third may have 3 screens.  This
includes endpoints where the number of devices of a given type is zero. 
<vspace blankLines='1' />  
Use case is asymmetric point to point and  multipoint.
</t> 

<t>
The solution MUST support means of enabling interoperability between
telepresence endpoints where cameras are of different picture aspect
ratios.  
</t>

<t>
The solution MUST provide scaling information which enables rendering
of a video image at the actual size of the captured scene.  
</t>
<t>
The solution MUST support means of enabling interoperability between
telepresence endpoints where displays are of different resolutions. 
</t>
<t>
The solution MUST support methods for handling different bit rates in
the same conference. 
</t>
<t>
The solution MUST support means of enabling interoperability between
endpoints that send and receive different numbers of media streams.
<vspace blankLines='1' />
Use case heterogeneous and multipoint.
</t>

<t>
The solution MUST ensure that endpoints that support telepresence 
extensions can establish a session with a SIP endpoint that does not 
support the telepresence extensions.  For example, in the case of a SIP 
endpoint that supports a single audio and a single video stream, 
an endpoint that supports the telepresence extensions would setup 
a session with a single audio and single video stream using existing 
SIP and SDP mechanisms.   
</t>

<t>
The solution MUST support a mechanism for determining whether or not
an endpoint or MCU is capable of telepresence extensions. 
</t>

<t>
The solution MUST support a means to enable more than two endpoints to
participate in a teleconference. 
<vspace blankLines='1' />
Use case multipoint.
</t>

<t>
The solution MUST support both transcoding and switching approaches to
providing multipoint conferences.
</t>

<t>
The solution MUST support mechanisms to allow media from one 
source endpoint or/and multiple source endpoints to be sent 
to a remote endpoint at a particular point in time. Which media is 
sent at a point in time may be based on local policy.
</t>

<t>
The solution MUST provide mechanisms to support the following:  
<list style='symbols'>
<t>
	Presentations with different media sources 
</t>
<t>
	Presentations for which the media streams are visible to all endpoints 
</t>
<t>
	Multiple, simultaneous presentation media streams, including 
	presentation media streams that are spatially related to each other.</t>
</list>

<list style="hanging">  
<t> 	
Use case is presentation.   
</t>
</list>
</t>

<t>
The specification of any new protocols for the solution MUST provide extensibility mechanisms. </t>

<t anchor='change-info'>
The solution MUST support a mechanism for allowing information about
media captures to change during a conference. 
</t>

<t anchor='sec-req'>
The solution MUST provide
a mechanism for the secure exchange of information about 
the media captures.  
</t>

</list> <!REQMTS>

</section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      
<t> This draft has benefitted from all the comments on the mailing list 
   and a number of discussions. So many people contributed that it is
   not possible to list them all.  However, the comments provided by Roberta Presta,
   Christian Groves and Paul Coverdale during WGLC were particularly helpful in
   completing the WG document. 
</t>
    </section>


    <section anchor="IANA" title="IANA Considerations">
      <t>There are no IANA considerations associated with this specification.</t>

    </section>

    <section anchor="Security" title="Security Considerations">
        <t> Requirement REQMT-18 identifies the need to securely 
        transport the information about media captures.  It is important to note that 
        session setup for a telepresence session will use SIP for basic session setup
        and either SIP or CCMP for a multi-party telepresence session. 
        Information carried in the SIP signaling can be secured by the SIP 
        security mechanisms as defined in <xref target="RFC3261"/>. In the case of 
        conference control using CCMP, the security model and mechanisms 
        as defined in the XCON Framework <xref target="RFC5239"/> and CCMP <xref target="RFC6503"/>
        documents would meet the
        requirement.  Any additional
        signaling mechanism used to transport the information about media captures 
        would need to define the mechanisms by the which the information
        is secure.
        The details for the mechanisms needs to be defined 
        and described in the CLUE framework document and related 
        solution document(s).</t>
               
   </section>
    
  </middle>



  <back>
    <references title="Informative References">
     &RFC2119;
     &RFC3261;
     &RFC4353;
     &RFC4579;
     &RFC5117;
     &RFC5239;
     &RFC6503;
   
    </references>
   



<section title='Changes From Earlier Versions'>

<t>Note to the RFC-Editor: please remove this section prior to publication
as an RFC.</t>

<section title= 'Changes from draft -06'>
<t>  Addressing IETF LC comments/editorial nits resulting in the following changes:</t>
<t>
<list style='symbols'> 
<t> Included expansion of CLUE in the abstract. </t>
<t> 
Deleted definitions for "Left" and "Right". </t>
<t> Section 5 - clarified that solution = protocol specifications to support requirements. </t>
<t> REQMT-1d, REQMT-2d: Changed term "extent" to "area of coverage" 
</t>
<t> REQMT-10 - clarified requirement with regards to interworking with non-CLUE endpoints </t> 
<t> REQMT-15 - reworded to be more specific and normative </t>
<t> REQMT-16 - expanded on what is meant by "extensibility" </t> 

</list>
</t>
</section>

<section title= 'Changes from draft -05'>
<t>  Addressing WGLC comments resulting in the following changes:</t>
<t>
<list style='symbols'> 
<t> REQMT-12: Changed term "site" to "endpoint" 
</t>
<t> Intro: clarified that SIP based conferencing also is relevant to CLUE.</t>
<t> Intro: clarified that while CLUE doesn't dictate implementation choices, it does describe a framework 
for the protocol solution.</t>
<t> Clarified that mapping to use cases isn't comprehensive (i.e., only done when there is a direct correlation).</t>
<t> Added text that the requirements do not reflect all those required for the solution - i.e., the solution 
can provide more functionality as needed.   </t>
<t> Editorial nits and clarifications - changed lc "must" to UC (REQMT-17). </t>
</list>
</t>
</section>
<section title= 'Changes from draft -04'>
<t> 
<list style='symbols'>
<t> Removed REQMT-2c, related to issue #37 in the tracker. </t>
<t> Deleted REQMT-3b. Condensed REQMT-3 to subsume REQMT-3a. This is related to Issue #38 in the tracker. 
</t>
<t> Updated REQMT-14 based on (mailing list) resolution of Issue #39.</t>
<t> Deleted OPEN issue section as those were transferred to the ID tracker and have been resolved either by changes to this document or to earlier versions of the document </t>
</list>
</t>
</section>

<section title= 'Changes from draft -03'>
<t> <list style='symbols'>
<t> Added a tad more text to the security section <xref target='sec-req'/>.
</t>
</list>
</t>
</section>


<section title= 'Changes from draft -02'>
<t> <list style='symbols'>
<t> Updated IANA section - i.e., no IANA registrations required.</t>
<t> Added security requirement <xref target='sec-req'/>.
</t>
<t> Added some initial text to the security section. </t>
</list>
</t>
</section>

<section title= 'Changes from draft -01'>
<t> <list style='symbols'>
<t> Cleaned up the Problem Statement section, re-worded.
</t><t>
Added Requirement <xref target='change-info'/> in response to WG Issue #4 to make a requirement for 
dynamically changing information. Approved by WG
</t><t>
Added requirements #1.c and #1.d. Approved by WG
</t><t>
Added requirements #2.d and #2.e. Approved by WG
</t>
</list>
</t>
</section>

<section title='Changes From Draft -00'>

<list style='symbols'>
<t>
Requirement #2, The solution MUST support a means to identify
monaural, stereophonic (2.0), and 3.0 (left, center, right) audio channels.

<figure>                <artwork><![CDATA[  
    changed to

]]></artwork>        <postamble />            </figure> 

The solution MUST support a means to identify the number and spatial
arrangement of audio channels including monaural, stereophonic 
(2.0), and 3.0 (left, center, right) audio channels. 
</t><t>
Added  back references to the Use case document.
</t> <list style='symbols'>
<t>
 Requirement #1          Use case point to point symmetric, and all
 other use cases. 
</t><t>
 Requirement #2          Use case point to point symmetric, and all
 use cases, especially heterogeneous. 
</t><t>
 Requirement #3           Use case point to point symmetric, and all use cases. 
</t><t>
Requirement #4          Use case is asymmetric point to point, and multipoint. 
</t><t>
Requirement #9         Use case heterogeneous and multipoint.
</t><t>
Requirement #12         Use case multipoint.
</t>
 
</list>
</list>

</section>

</section>


  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 03:02:47