One document matched: draft-ietf-mediactrl-architecture-03.xml


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
	<!ENTITY rfc0793 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.0793.xml'>
	<!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
	<!ENTITY rfc2976 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2976.xml'>
	<!ENTITY rfc3261 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3261.xml'>
	<!ENTITY rfc3262 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3262.xml'>
	<!ENTITY rfc3263 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3263.xml'>
	<!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml'>
	<!ENTITY rfc3550 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml'>
	<!ENTITY rfc3725 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3725.xml'>
	<!ENTITY rfc3840 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3840.xml'>
	<!ENTITY rfc4145 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4145.xml'>
	<!ENTITY rfc4240 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4240.xml'>
	<!ENTITY rfc4346 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4346.xml'>
	<!ENTITY rfc4353 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4353.xml'>
	<!ENTITY rfc4474 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4474.xml'>
	<!ENTITY rfc4566 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml'>
	<!ENTITY rfc4575 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4575.xml'>
	<!ENTITY rfc4579 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4579.xml'>
	<!ENTITY rfc4582 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4582.xml'>
	<!ENTITY rfc4583 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4583.xml'>
	<!ENTITY rfc4585 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4585.xml'>
	<!ENTITY rfc4960 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4960.xml'>
	<!ENTITY rfc5167 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5167.xml'>
	<!ENTITY xcon-dm PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-xcon-common-data-model.xml'>
	<!ENTITY xcon-frmk PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-xcon-framework.xml'>
	<!ENTITY sip-ctrl-fw PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-mediactrl-sip-control-framework.xml'>
	<!ENTITY w3c-vxml PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml4/reference.W3C.REC-voicexml20-20040316.xml'>
	<!ENTITY w3c-xml PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml4/reference.W3C.REC-xml-20060816.xml'>
 	
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>

<rfc docName="draft-ietf-mediactrl-architecture-03" ipr="full3978" category="info">
	<front>

		<title abbrev="Mediactrl Architecture">An Architectural Framework for Media 
		Server Control</title>
		
		<author fullname="Tim Melanchuk" initials="T." surname="Melanchuk"
			role="editor">
			<organization>Rain Willow Communications</organization>
			<address>
				<email>tim.melanchuk@gmail.com</email>
			</address>
		</author>

		<date year="2008"/>
		<workgroup>MediaCtrl</workgroup>
		<abstract>

			<t>This document describes an Architectural Framework for Media Server 
			Control. The primary focus will be to define logical entities that 
			exist within the context of Media Server control,  
			and define the appropriate naming conventions and 
			interactions between them.	
			</t>

		</abstract>
		<!-- Abstract -->
	</front>
	<middle>

	  <section title="Introduction">

	    <t>Application Servers host
	    one or more instances of a communications application. Media servers
	    provide real time media processing functions. This documents presents 
	    the core architectural framework to allow
	    Application Servers to control Media Servers.
	    An overview of the architecture describing the core logical entities and
	    their interactions is presented in <xref target="sec:arch-overview"/>.
	    The requirements for media server control are defined in 
	    <xref target="RFC5167"/>.</t>
	    
	    <t>SIP is used as the session establishment protocol within this 
	    architecture. Application Servers use it both to
	    terminate media streams on Media Servers and to create and
	    manage control channels for media server control between themselves 
	    and Media Servers. The detailed model for media server control
	    together with a description of SIP usage is presented in
	    <xref target="sec:SIP-usage"/>.</t>
	    
	    <t>Several services are described using the framework defined in this document.
	    Use cases for IVR services are described in <xref target="sec:ivr"/>
	    and conferencing use cases are described in <xref target="sec:conferencing"/>.
	    </t>
        
	  </section>
		


<!-- Introduction -->
		

<section anchor="Terminology" title="Terminology">
			
<t>The following additional terms are defined for use in this document  
in the context of Media Server control:
			
<list style="hanging">

<t hangText="Application Server (AS):">A functional entity that hosts one
or more instances of a communications application.</t>
<t hangText="Media Functions:">Functions available on a Media Server 
that are used to supply media services to the AS. Some examples are  
Dual-Tone Multi-Frequency (DTMF) detection, mixing, 
transcoding, playing announcement, recording, etc.</t>
<t hangText="Media Resource Broker (MRB):">Assigns specific Media
Server resources to incoming calls at the request
of service applications (i.e., an AS), which happens in real time as
calls come into the network; may acquire knowledge of media server 
resources utilization that it can use to 
help decide which MS resources to assign to resource requests from 
applications; and employs methods/algorithms to determine 
MS resource assignment.</t>
<t hangText="Media Server (MS):">A functional entity whose 
main task is to supply real time media related functions to communication 
applications. In the architecture for the 3GPP IP Multimedia 
Subsystem (IMS) a Media Server is referred
to as a Media Resource Function (MRF).</t>
<t hangText="Media Services:">Application service requiring media functions 
such as Interactive Voice Response (IVR) or Media conferencing.</t>
<t hangText="Media Session:">From the Session Description Protocol
(SDP) specification <xref target="RFC4566"/>: 
"A multimedia session is a set of 
multimedia senders and receivers and the data streams flowing from senders 
to receivers. A multimedia conference is an example of a multimedia 
session."</t>
<t hangText="MS Control Channel:">A reliable transport connection between the 
AS and MS used to exchange MS Control PDUs. Implementations must support the
Transport Control Protocol (TCP) <xref target="RFC0793"/> 
and may support the Stream Control Transmission Protocol (SCTP)
<xref target="RFC4960"/>.
Implementations must support
TLS <xref target="RFC4346"/> as a transport-level security mechanism
although its use in deployments is optional.</t>
<t hangText="MS Control Dialog:">A SIP dialog that is used for establishing
a control channel between the UA and the MS.</t>
<t hangText="MS Control Protocol:">The protocol used for by an AS to control
a MS. The MS Control Protocol assumes a reliable underlying transport
protocol for the MS Control Channel.</t>
<t hangText="MS Media Dialog:">A SIP dialog between the AS and Media Server
that is used for establishing media sessions between a user device
such as a SIP phone and the Media Server.</t>
</list>
</t>

</section>

<!-- Terminology -->

<section anchor="sec:arch-overview" title="Architecture Overview">

	<t>A Media Server (MS) is a network device that processes media streams.
	Examples of media processing functionality may include:
	<list style="symbols">
		<t>Control of the Real-Time Protocol (RTP)
		<xref target="RFC3550"/> streams such as 
		video fast update and flow control using Real-Time Control Protocol
		(RTCP) feedback <xref target="RFC4585"/>.</t>
		<t>Mixing of incoming media streams.</t>
		<t>Media stream source (for multimedia announcements).</t>
		<t>Media stream processing (e.g. transcoding, DTMF detection).</t>
		<t>Media stream sink (for multimedia recordings)</t>
	</list>
	A MS supplies one or more media processing functionalities,
	which may include others than those illustrated above, to an
    Application Server (AS).
    An AS is able to send a particular call to a
    suitable MS, either through discovery of the capabilities that a
    specific MS provides or through the use of a Media Resource Broker.
</t>
	
	<t>The type of processing that a Media Server performs on media 
	streams is specified and controlled by an Application Server.
	Application Servers are logical entities that are
	capable of running one or more instances of a communications application.
	Examples of Application Servers that may interact with a Media Server 
	are an AS acting as a Conference 'Focus' as defined in
	<xref target="RFC4353"/>
	or an IVR application using a Media Server to play announcements 
	and detect DTMF key presses. </t>
	
	<t>Application servers use SIP to
	establish control channels between themselves and MSs.
	A MS Control Channel implements a reliable transport protocol
	that is is used to carry the MS Control Protocol. 
	A SIP dialog used to establish a control channel
	is referred to as a MS Control Dialog.</t>
	
	<t>Application Servers terminate SIP <xref target="RFC3261"/>
	signaling from SIP User Agents
	and may terminate other signaling outside the scope of this document.
	They  use SIP Third Party Call Control 
	<xref target="RFC3725"/> (3PCC) to establish,
	maintain, and tear down media streams from those SIP UAs
	to a Media Server. A SIP dialog used by an AS to establish a media
	session on an MS is referred to as a MS Media Dialog.</t>
	
	<t>Media streams go directly between SIP User Agents and
	Media Servers. Media Servers support multiple types of media. Common 
	supported media types include audio and video but others such as text and
	the Binary Floor Control Protocol
	(BFCP) <xref target="RFC4583"/> are also possible.
	This basic architecture, showing session establishment signaling
	between a single AS and MS is shown in <xref target="fig:basic-arch"/> below.</t>
	
	<figure anchor="fig:basic-arch" title="Basic Signalling Architecture">
				<artwork><![CDATA[

             +-------------+                         +--------------+
             |             | SIP (MS Control Dialog) |              |
             | Application |<----------------------->|     Media    |
             |   Server    |                         |    Server    |
             |             |<----------------------->|              |
             +-------------+ SIP (MS Media Dialog)   +--------------+
                         ^                               ^
                          \                              | RTP/SRTP
                           \                             |  audio/
                            \                            | video/etc)
                             \                           |
                              \                          v
                               \                 +--------------+
                                \     SIP        |              |
                                 +-------------->|      SIP     | 
                                                 |  User Agent  |  
                                                 |              |  
                                                 +--------------+   
]]></artwork>
			</figure>	

	
	<t>The architecture must support a many-to-many relationship 
	between Application Servers and Media Servers.
	In real world deployments, an Application 
	Server may interact with multiple Media Servers and/or a Media Server 
	may be controlled by more than one Application Server.</t>
	
	<t>Application Servers can use the SIP URI as described in 
	<xref target="RFC4240"/> to
	request basic functions from Media Servers. Basic functions are
	characterized as requiring no mid-call interactions between the 
	AS and MS. Examples of these functions are simple announcement 
	playing or basic conference mixing where the AS does not 
	need to explicitly control the mixing.</t>
	
	<t>Most services however have interactions between the AS and MS
	during a call or conference. The type of interactions can be generalized
	as follows:
	<list style="symbols">
		<t>commands from an AS to an MS to request the application or
		configuration of a function. The request may apply to a 
		single media stream, multiple media streams associated
		with multiple SIP dialogs, or to properties of a conference mix.</t>
		<t>responses from an MS to an AS reporting on the status
		of particular command.</t>
		<t>notifications from an MS to an AS that report results
		from commands or notify changes to subscribed status.</t>		
	</list>
	</t>
	
	<t>Commands, responses, and notifications are transported using
	one or more dedicated control channels between the Application
	Server and the Media Server. Dedicated control channels provide
	reliable, sequenced, peer to peer transport for media server control
	interactions.
	Implementations must support the
    Transport Control Protocol (TCP) <xref target="RFC0793"/> 
    and may support the Stream Control Transmission Protocol (SCTP)
    <xref target="RFC4960"/>.
    Implementations must support
    TLS <xref target="RFC4346"/> as a transport-level security mechanism
    although its use in deployments is optional.
	A dedicated control channel is shown
	in <xref target="fig:ctrl-arch"/> below.</t>
	
	<figure anchor="fig:ctrl-arch" title="Media Server Control Architecture">
				<artwork><![CDATA[

          +-------------+                     +--------------+
          |             |                     |              |
          | Application |   MS ctrl channel   |     Media    |
          |   Server    |<------------------->|    Server    |
          |             |                     |              |
          +-------------+                     +--------------+
                                                      ^ ^ ^
                                             RTP/SRTP | | |
                                             (audio/  | | | 
                                           video/etc) | | |
                                                      | | v
                                                  +---|-v-------+
                                                +-|---v-------+ |
                                              +-|-----------+ | |
                                              |             | | |
                                              |     SIP     | | | 
                                              | User Agent  | |-+ 
                                              |             |-+    
                                              +-------------+    


]]></artwork>
			</figure>	


	
	<t>Both Application Servers and Media Servers may interact with
	other servers for specific purposes beyond the scope of this
	document. For example Application Servers will often communicate
	with other infrastructure components that are usually based on 
	deployment requirements with links to back-office 
	data stores and applications. Media Servers will often retrieve announcements
	from external file servers. Also, many Media Servers support
	IVR dialog services using VoiceXML
	<xref target="W3C.REC-voicexml20-20040316"/>. In this case the MS interacts
	with other servers using HTTP during standard VoiceXML processing.
	VoiceXML Media Servers may also interact with speech engines,
	for example using MRCPv2,
	for speech recognition and generation purposes.</t>
	
	<t>Some specific types of interactions between Application and
	Media servers are also out of scope this document. MS resource
	reservation is one such interaction. Also, any interactions between
	Application Servers, or between Media Servers, are also out of
	scope.</t>
	
</section>
<!-- Overview -->

<section anchor="sec:SIP-usage" title="SIP Usage">			

<t>The Session Initiation Protocol (SIP) <xref target="RFC3261"/> was 
developed by the IETF for the purposes of initiating, managing and 
terminating multimedia sessions.  The popularity of SIP has grown 
dramatically since its inception and is now the primary Voice 
over IP (VoIP) protocol.  This includes being selected as the 
basis for architectures such as the IP Multimedia Subsystem 
(IMS) in 3GPP and included in many of the early live deployments 
of VoIP related systems.  Media servers are not a new concept in 
IP telephony networks and there have been numerous 
signaling protocols and techniques proposed for their control.  
The most popular techniques to date have used a combination of SIP and 
various markup languages to convey media service 
requests and responses.</t>

<t>As discussed in <xref target="sec:arch-overview"/> and 
illustrated in <xref target="fig:basic-arch"/>, the logical 
architecture described by this document involves interactions between an 
Application Server (AS) and a Media Server (MS).  The SIP
interactions can be broken into ‘MS media dialogs’ – used between
an AS and a MS to establish media sessions between an endpoint
and a Media Server, and ‘MS control dialogs’ – which are used to
establish and maintain MS control channels.</t>

<t>SIP is the primary signaling 
protocol for session signaling and is used for 
all media sessions directed towards a Media Server as described in this document.   
Media Servers may support other signaling
protocols but this type of interaction is not considered here.
Application Servers may terminate non-SIP signaling protocols
but must gateway those requests to SIP when interacting with
a Media Server.
</t>

<t>SIP will also be used for the creation, management and 
termination of the dedicated MS control channel(s).
A control channel provides reliable 
delivery of MS Control Protocol messages. The Application
and Media Servers use the SDP attributes defined in 
<xref target="RFC4145"/>
to allow SIP negotiation of a transport connection.
Further details and example flows are provided in the SIP Control 
Framework <xref target="I-D.ietf-mediactrl-sip-control-framework"/>.
The SIP Control Framework also includes basic control message
semantics corresponding to the types of interactions identified
in <xref target="sec:arch-overview"/>. It uses
the concept of "packages" to allow domain specific protocols
to be defined using the Extensible Markup Language (XML) 
<xref target="W3C.REC-xml-20060816"/> format. The MS Control
Protocol is made up of one or more packages for 
the SIP Control Framework.
</t>

<t>Using SIP for both media and control dialogs provides a 
number of inherent benefits over other potential techniques.
These include:
    <list style="numbers">
    
    <t>The use of SIP location and rendezvous capabilities, 
    as defined in <xref target="RFC3263"/>.  This provides 
    core mechanisms for routing a SIP request based on 
    techniques such as DNS SRV and NAPTR records. The SIP infrastructure 
    makes heavy use of such techniques.</t>
    
    <t>The security and identity properties of SIP. For example,
    using TLS for reliably and securely connecting to another 
    SIP based entity. The SIP protocol has a number of Identity 
    mechanisms that can be used. <xref target="RFC3261"/> provides an 
    intra-domain digest-based mechanism and  
    <xref target="RFC4474"/> defines a certificate 
    based inter-domain identity mechanism.
    SIP with S/MIME provides the ability to secure payloads using 
    encrypted and signed certificate techniques.</t>

    <t>SIP has extremely powerful and dynamic media negotiation 
    properties as defined in <xref target="RFC3261"/> and 
    <xref target="RFC3264"/>.
    </t>

    <t>The ability to select an appropriate SIP entity based 
    on capability sets as discussed in <xref target="RFC3840"/>.  
    This provides a powerful function that allows media servers to convey 
    a specific capability set.  An AS is then free to select an 
    appropriate MS based on its requirements.</t>
   
    <t>Using SIP also provides consistency with IETF protocols 
    and usages.  SIP was intended to be used for the creation 
    and management of media sessions and this provides a correct 
    usage of the protocol.</t>    
    
    </list>
</t>

<t>As mentioned previously in this section, Media services 
using SIP are fairly well understood.  Some previous proposals
suggested using the SIP INFO <xref target="RFC2976"/> method
as the transport vehicle between the AS and MS.  
Using SIP INFO in this way is not advised for a 
number of reasons which include:

    <list style="symbols">
    
    <t>INFO is an opaque request with no specific semantics. A SIP endpoint
    that receives an INFO request does not know what to do with it based on
    SIP signaling.</t>
    
    <t>SIP INFO was not created to carry generic session control 
    information along the signaling path and it should only 
    really be used for optional application information e.g. 
    carrying mid-call PSTN signaling messages between 
    PSTN gateways.</t>
    
    <t>SIP INFO traverses the signaling path which is an inefficient use
    for control messages which can be routed directly between the AS
    and MS.</t>

    <t><xref target="RFC3261"/> contains rules when using an un-reliable 
    protocol such as UDP.  When a packet reaches a size close to 
    the Maximum Transmission Unit (MTU) the protocol should be 
    changed to TCP.  This type of operation is not ideal when 
    constantly dealing with large payloads such as XML formatted
    MS control messages.</t>

    </list>
</t>

</section>
<!-- SIP Usage -->

<section anchor="sec:ivr" title="Media Control for IVR Services">			

<t>One of the functions of a Media Server is to assist an
Application Server implementing IVR services by performing
media processing functions on media streams.
Although IVR is somewhat generic terminology, the scope 
of media functions provided by a MS addresses the needs
for user interaction dialogs. These functions include media transcoding, 
basic announcements, user input detection (via DTMF or speech) and
media recording.
</t>

<t>A particular IVR or user dialog application typically requires 
the use of several specific media functions, as described above. 
The range and complexity of IVR dialogs can vary significantly, 
from a simple single announcement play-back to complex voice mail 
applications.</t>

<t>As previously discussed, an AS uses SIP <xref target="RFC3261"/> 
and SDP <xref target="RFC4566"/> to establish and configure media 
sessions to a media server. An AS uses the MS control channel,
established using SIP, to invoke IVR requests and to receive 
responses and notifications. This topology is shown in
<xref target="fig:ivr-arch"/> below.</t>

<figure anchor="fig:ivr-arch" title="IVR Topology">
				<artwork><![CDATA[
				
   +-------------+             SIP              +-------------+
   | Application |<---------------------------->|   Media     |
   |    Server   | (media & MS Control dialogs) |   Server    |
   |             |                              |             |
   |             |  MS Control Protocol (IVR)   |             |
   |             |<---------------------------->| (IVR media  |
   | (App logic) |       (CtrlChannel)          | functions)  |
   +-------------+                              +-------------+
          ^                                            ^^
           \                                           ||  R
            \                                          ||  T
             \                                         ||  P
              \                                        ||  /
               \                                       ||  S 
                \                                      ||  R 
                 \                                     ||  T
                  \                                    ||  P
                   \                                   vv
                    \    call signaling           +-----------+
                     ---------------------------->|     UE    |
                          (e.g. SIP)              +-----------+


]]></artwork>
			</figure>	

<t>The variety in complexity of Application Server IVR services
requires support for different levels of media 
functions from the Media Server as described in the following
sub-sections.</t>

<section anchor="sec:ivr-basic" title="Basic IVR Services">			

<t>For simple basic announcement requests the MS control channel, as
depicted in <xref target="fig:ivr-arch"/> above, is not required.  
Simple announcement requests may be invoked on the Media Server 
using the SIP URI mechanism defined in <xref target="RFC4240"/>.  
This interface allows no user input
digit detection and collection and no mid-call dialog
control.  However, many applications only require basic media
services and the processing burden on the media server to support
more complex interactions with the AS would not be needed
in this case.</t>

</section>
<!-- Basic IVR Services -->

<section anchor="sec:ivr-mid-call" title="IVR Services with Mid-call Controls">			

<t>For more complex IVR dialogs which require mid-call 
interaction and control between the Application Server and the 
Media Server, the MS control channel (as shown in
<xref target="fig:ivr-arch"/> above is used to invoke specific 
media functions on the Media Server. These functions include, 
but are not limited to, complex announcements with barge-in 
facility, user input detection and reporting (e.g. DTMF) to an
Application Server, DTMF and speech activity controlled 
recordings, etc. Composite services, such as play-collect 
and play-record, are also addressed by this model.</t>

<t>Mid-call control also allows Application Servers to subscribe
to IVR related events and for the Media Server to notify these
events when they occur. Examples of such events are
announcement completion events, record completion events, and
reporting of collected DTMF digits.</t>

</section>
<!-- IVR Services with Mid-call Controls -->

<section anchor="sec:ivr-vxml" title="Advanced IVR Services">			

<t>Although IVR Services with Mid-call Control, 
as described above, provides a comprehensive set of media 
functions expected from a Media Server, the Advanced IVR 
Services model allows a higher level of abstraction describing application logic, 
as provided by VoiceXML, to be executed on the Media Server.  
Invocation of VoiceXML IVR dialogs may be via the ‘Prompt and Collect’ 
mechanism of <xref target="RFC4240"/>. 
Additionally, VoiceXML dialog services may be invoked 
over the MS control channel, as shown in <xref target="fig:ivr-arch"/> 
above. VoiceXML IVR services invoked on the Media Server require an 
HTTP interface between the Media Server and one or more back-end
servers that host or generate VoiceXML documents. These server(s)
may or may not be physically separate from 
the Application Sever.</t>

</section>
<!-- Advanced Media Services -->


</section>
<!-- IVR -->

<section anchor="sec:conferencing" title="Media Control for Conferencing Services">			

<t><xref target="RFC4353"/> describes the overall 
architecture and protocol components needed for multipoint 
conferencing using SIP.  The framework
for centralized conferencing 
<xref target="I-D.ietf-xcon-framework"/>
[draft-ietf-xcon-framework-08] extends
the framework to include a protocol between the user and the conferencing
server. <xref target="RFC4353"/> describes the conferencing server decomposition 
but leaves the specifics open.</t>

<t>This section describes the decomposition and discusses the 
functionality of the decomposed functional units. The conferencing 
factory and the conference focus are part of the Application Server 
described in this document.</t>

<t>An Application Server uses SIP Third Party Call Control 
<xref target="RFC3725"/> to
establish media sessions from SIP user agents to a Media Server. The
same mechanism is used by the Application Server as described in
this section to add/remove participants to/from a conference,
as well as to handle the involved media streams set up on a per-user basis.
Since the XCON framework has been conceived as protocol-agnostic when
talking about the Call Signaling Protocol used by users to join a 
conference, an XCON-compliant Application Server will have to take 
care of gatewaying non-SIP signaling negotiations, 
in order to set up and make available valid SIP media session between 
itself and the Media Server, while still keeping the non-SIP 
interaction with the user in a transparent way.</t>

<figure anchor="fig:conf-topology" title="Conference Topology">
	<artwork><![CDATA[

             +------------+             +------------+
             |            | SIP (2m+1c) |            |
             | Application|-------------|   Media    |
             |   Server   |             |   Server   |
             |  (Focus)   |-------------|  (Mixer)   |
             |            | CtrlChannel |            |
             +------------+             +------------+
                 |      \                    .. .
                 |       \\            RTP...   .
                 |         \\           ..      .
                 |     H.323  \\      ...       .
             SIP |             \\ ...           .RTP
                 |              ..\             .
                 |           ...   \\           .
                 |        ...        \\         .
                 |      ..             \\       .
                 |   ...                 \\     .
                 | ..                      \    .
            +-----------+              +-----------+
            |Participant|              |Participant|
            +-----------+              +-----------+

]]></artwork>
			</figure>	

<t>To complement the functionality provided by 3PCC and by XCON control 
protocol, the Application Server makes use of a dedicated media server 
control channel in order to set up and manage media conferences on the 
media server. <xref target="fig:conf-topology"/> shows the 
signaling and media paths for a two participant  
conference. The three SIP dialogs between the AS and MS establish
two media sessions (2m) from participants, one originally signaled using
H.323 and then gatewayed into SIP and one signaled directly in SIP, 
and one control session (1c).</t>

<t>As a conference focus, the Application Server is responsible for setting
up and managing a media conference on the media servers, in order to make 
sure that the all media streams provided in a conference are available 
to its participants. This is achieved by using the services of 
one or more mixer entities, as described in RFC4353, whose role as 
part of the Media Server is described in this section. Services 
required by the Application Server include, but are not limited to, 
means to set up, handle and destroy a new media conference,
adding and removing participants from a conference, managing media streams
in a conference, controlling the layout and the mixing configuration for each
involved media, allowing per-user custom media profiles and so on.</t>

<t>As a mixer entity, in such a multimedia conferencing scenario the Media
Server receives a set of media streams of the same type
(after transcoding if needed) and then takes
care of combining the received media in a type-specific manner,
redistributing the result to each authorized participant. The way
each media stream is combined, as well as the media-related policies, 
is properly configured and handled by the Application Server by 
means of a dedicated MS control channel.</t>

<t>To summarize the AS needs to be able to manage Media Servers at a 
conference and participant level. </t>

<section anchor="sec:conf-create" title="Creating a New Conference">			

<t>When a new conference is created, as a result of a previous
conference scheduling or of first participant dialing in
to a specified URI, the Application Server must take care of 
appropriately creating a media conference on the Media Server. 
It does so by sending an explicit request to the Media Server. 
This can be by means of a MS control channel
message. This request may contain detailed information upon the desired
settings and policies for the conference (e.g. the media to involve,
the mixing configuration for them, relevant identifiers, etc.). The Media
Server validates such a request and takes care of allocating the needed
resources to set up the media conference.
</t>

<t>There is another way using SIP-based mechanisms such as <xref target="RFC4240"/> or 
<xref target="RFC4579"/> using pre-defined conference profiles and then
using the MS control channel afterwards to control the conference if needed.
</t>

<t>Once done, the MS informs the Application Server about the result of the
request. Each conference will be referred to by a specific identifier,
which both the Application Server and the Media Server will include in
subsequent transactions related to the same conference (e.g. to modify
the settings of an extant conference).</t>

</section>
<!-- Conference Creation -->

<section anchor="sec:conf-adding" title="Adding a Participant To a Conference">			

<t>As stated before, an Application Server uses SIP 3PCC 
to establish media sessions from SIP user 
agents to a Media Server. The URI that the AS uses in the INVITE
to the MS may be one associated with the conference on the MS.
More likely however, the media sessions are first established
to the media server using a URI for the media server and then
subsequently joined to the conference using the MS Control
Protocol. This allows IVR dialogs to be performed prior to
joining the conference. 
</t>

<t>The AS as a 3PCC correlates the media session negotiation 
between the UA and the MS, in order to
appropriately establish all the needed media streams based on the 
conference policies. 
</t>

</section>
<!-- Adding a Participant -->

<section anchor="sec:conf-media-ctrls" title="Media Controls">			

<t>The XCON Common Data Model  
<xref target="I-D.ietf-xcon-common-data-model"/>
currently defines some basic media-related controls, which 
conference-aware participants can take advantage of in several 
ways, e.g. by means of a XCON conference control protocol 
or IVR dialogs. These controls 
include the possibility to modify the participants' own volume for 
audio in the conference, configure the desired layout for incoming 
video streams, mute/unmute oneself and pause/unpause one's own video 
stream. Such controls are exploited by conference-aware participants 
through the use of dedicated conference control protocol requests 
to the Application Server. The Application Server takes care of 
validating such requests and translates them into the Media 
Server Control Protocol, before forwarding them over the MS Control
Channel to the MS. According to the directives provided by the Application 
Server, the Media Server manipulates the involved media streams 
accordingly.</t>

<figure anchor="fig:conf-unmute-example" title="Conferencing Example: Unmuting A Participant">
				<artwork><![CDATA[

               +------------+                  +------------+
               |            | 'Include audio   |            |
               | Application|  sent by user X  |   Media    |
               |   Server   |  in conf Y mix'  |   Server   |
               |  (Focus)   |----------------->|  (Mixer)   |
               |            |   (MS CtrlChn)   |            |
               +------^-----+                  +------------+
                      |                          ..
                      |                       ...
                      | 'Unmute me'        ... RTP
                      |   (XCON)        ...
                      |              ...
                      |           ...
               +-----------+   ...
               |Participant|...
               +-----------+

]]></artwork>
			</figure>	

<t>The media server may need to inform the AS of events like in-band 
DTMF tones during the conference.</t>

</section>
<!-- Media Controls -->

<section anchor="sec:conf-floor-ctrl" title="Floor Control">			

<t>The XCON framework introduces "floor control" functionality
as an enhancement upon <xref target="RFC4575"/>. 
Floor control is a means to manage joint or exclusive access to
shared resources in a (multiparty) conferencing environment.
Floor control is 
not a mandatory mechanism for a conferencing system implementation, 
but it provides advanced media input control features 
for conference-aware users. Such mechanism allows for a coordinated 
and moderated access to any set of resources provided by the 
conferencing system. To do so, a so-called floor is associated 
to a set of resources, thus representing for users the right to 
access and manipulate the related resources themselves. In order 
to take advantage of the floor control functionality, a specific 
protocol, the Binary Floor Control Protocol, has been specified 
<xref target="RFC4582"/>. <xref target="RFC4583"/>
provides a way for SIP UAs to set up a BFCP 
connection towards the Floor Control Server and exploit floor 
control by means of a COMEDIA <xref target="RFC4145"/> negotiation.</t>

<t>In the context of the AS-MS interaction, floor control 
constitutes a further means to control users' media streams. A 
typical example is a floor associated with the right to access 
the shared audio channel in a conference.
A user who is granted such a floor is granted by the 
conferencing system the right to talk, which means that its 
audio frames are included by the MS in the overall audio 
conference mix. Similarly, when the floor is revoked the user 
is muted in the conference, and its audio is excluded from 
the final mix.</t>

<t>The BFCP defines a Floor Control Server (FCS) and the Floor 
chair. It is clear that the floor chair making decisions about 
floor requests is part of the application logic. This implies 
that when the floor chair role in a conference is automated, 
it will normally be part of the AS.</t>

<t>The example makes it clear that there can be a direct or 
indirect interaction between the Floor Control Server and the 
Media Server, in order to correctly bind each floor to its 
related set of media resources. Besides, a similar interaction is 
needed between the Floor Control Server and the Application 
Server as well, since the latter must be aware of all the 
associations between floors and resources, in order to opportunely 
orchestrate the related bindings with the element responsible 
for such resources (e.g. the Media Server when talking about 
audio and/or video streams) and the operations upon them 
(e.g. mute/unmute a user in a conference). For this reason, 
the Floor Control Server can be co-located with either the 
Media Server or the Application Server, as long as both 
elements are allowed to interact with the Floor Control 
Server by means of some kind of protocol.</t>
<t>In the following lines, both the approaches will be
described, in order to better explain the interactions
between the involved components in both the topologies.</t>


<t>When the AS and the FCS are colocated, the scenario is quite
straightforward. In fact it can be considered as a variation of the
case depicted in <xref target="fig:conf-unmute-example"/>. The only
relevant difference is that in this case the action the AS
commands on the control channel is triggered by a change in the
floor control status instead of a specific control requested
by a participant himself. The sequence diagram
in <xref target="fig:conf-bfcp-as"/>
describes the interaction between the involved parties in a typical
scenario. It assumes that a BFCP connection between the UA and the
FCS (which as we assume is colocated with the AS) has already 
been negotiated and established, and that the UA has been
made aware of all the relevant identifiers and 
floors-resources-associations (e.g. by means of <xref target="RFC4583"/>).
It also assumes that the AS has previously configured the media mixing
on the MS using the MS control channel. Every frame the UA might be 
sending on the related media stream is currently being dropped 
by the MS, since the UA still isn't authorized to use the 
resource. For a SIP UA, this state could be consequent 
to a 'sendonly' field associated to the media stream in 
a re-INVITE originated by the MS. It is worth pointing out 
that the AS has to make sure that no user-provided control 
mechanism, e.g. the CCP mixing controls, can override the 
floor control, when it is exploited.</t>

<figure anchor="fig:conf-bfcp-as" title="Conferencing Example: Floor Control Call Flow">
	<artwork><![CDATA[
  UA                                   AS                         MS
  (Floor Participant)                 (FCS)
  |                                     |                          |
  |<===================== One-way RTP stream ======================|
  |                                     |                          |
  | FloorRequest(BFCP)                  |                          |
  |------------------------------------>|                          |
  |                                     |                          |
  |   FloorRequestStatus[PENDING](BFCP) |                          |
  |<------------------------------------|                          |
  |                                     |--+ apply                 |
  |                                     |  | policies              |
  |                                     |<-+ to request            |
  |                                     |                          |
  |  FloorRequestStatus[ACCEPTED](BFCP) |                          |
  |<------------------------------------|                          |
  |                                     |                          |
  .                                     .                          .
  .                                     .                          .
  |                                     |                          |
  |   FloorRequestStatus[GRANTED](BFCP) |                          |
  |<------------------------------------|                          |
  |                                     | 'Unmute UA' (CtrlChn)    |
  |                                     |------------------------->|
  |                                     |                          |
  |<==================== Bidirectional RTP stream ================>|
  |                                     |                          |
  .                                     .                          .
  .                                     .                          .
  ]]></artwork>
</figure>

<t>A UA, which also acts as a floor participant, sends a 
‘FloorRequest’ to the floor control server (FCS, which is 
colocated with the AS), stating his will to be granted the 
floor associated with the audio stream in the conference. The 
AS answers the UA with a ‘FloorRequestStatus’ message with a 
PENDING status, meaning that a decision upon the request has 
not been taken yet. The AS, according to the BFCP policies for
this conference, takes a decision upon the request, i.e.
accepting it. Note that this decision might be relayed to
another participant in case he has previously been assigned
as chair of the floor. Assuming the request has been accepted,
the AS notifies the UA about the decision with a new 
‘FloorRequestStatus’, this time with an ACCEPTED status 
in it. The ACCEPTED status of course only means that the 
request has been accepted, which doesn’t mean the floor has 
been granted yet. Once the queue management in the FCS, 
according to the specified algorithms for scheduling, states 
that the floor request previously made by the UA can be granted, 
the AS sends a new ‘FloorRequestStatus’ to the UA with a 
GRANTED status, and takes care of unmuting the user in 
the conference by sending a directive to the MS through the
control channel. Once the UA receives the notification stating 
his request has been granted, he can start sending its media, 
aware of the fact that now his media stream won't be dropped 
by the MS. In case the session has been previously updated 
with a 'sendonly' associated to the media stream, the MS must 
originate a further re-INVITE stating that the media stream 
flow is now bidirectional ('sendrecv').</t>

<t>As mentioned before, this scenario envisages an automated floor chair role, 
where it’s the AS, according to some policies, which takes 
decisions upon floor requests. The case of a chair role 
impersonated by a real person is exactly the same, with the 
difference that the incoming request is not directly handled 
by the AS according to its policies, but it is instead forwarded
to the floor control participant the chair UA is exploiting. 
The decision upon the request is then communicated by the chair
UA to the AS-FCS by means of a ChairAction message.</t>


<t>The rest of this section will instead explore the other
scenario, which assumes the interaction between 
AS-FCS to happen through the MS control channel. This scenario 
is compliant with the H.248.19 document related to conferencing 
in 3GPP. The following sequence diagram describes the interaction 
between the involved parties in the same use-case scenario that
has been explored for the previous topology: consequently, the diagram
makes exactly the same assumptions that have been made for
the previously described scenario. This means that it again assumes 
that a BFCP connection between the UA and the FCS has already 
been negotiated and established, and that the UA has been
made aware of all the relevant identifiers and 
floors-resources-associations. 
It also assumes that the AS has previously configured the media mixing
on the MS using the MS control channel. This time it includes 
identifying the BFCP moderated resources, establishing
basic policies and instructions about chair identifiers for 
each resource, and subscribing to events of interest, considering
the FCS is not colocated with the AS anymore. Additionally,
a BFCP session has been established between the AS (which in this
scenario acts as a floor chair), and the FCS (MS).
Every frame the UA might be sending on the related media stream
is currently being dropped 
by the MS, since the UA still isn't authorized to use the 
resource. For a SIP UA, this state could be consequent 
to a 'sendonly' field associated to the media stream in 
a re-INVITE originated by the MS. It is again worth pointing out 
that the AS has to make sure that no user-provided control 
mechanism, e.g. the CCP mixing controls, can override the 
floor control, when it is exploited.</t>

<figure anchor="fig:conf-bfcp" title="Conferencing Example: Floor Control Call Flow">
				<artwork><![CDATA[

  UA                          AS                                  MS
  (Floor Participant)   (Floor Chair)                          (FCS)
  |                           |                                    |
  |<===================== One-way RTP stream ======================|
  |                           |                                    |
  | FloorRequest(BFCP)        |                                    |
  |--------------------------------------------------------------->|
  |                           |                                    |
  |                           |  FloorRequestStatus[PENDING](BFCP) |
  |<---------------------------------------------------------------|
  |                           |  FloorRequestStatus[PENDING](BFCP) |
  |                           |<-----------------------------------|
  |                           |                                    |
  |                           | ChairAction[ACCEPTED] (BFCP)       |
  |                           |----------------------------------->|
  |                           |       ChairActionAck (BFCP)        |
  |                           |<-----------------------------------|
  |                           |                                    |
  |                           | FloorRequestStatus[ACCEPTED](BFCP) |
  |<---------------------------------------------------------------|
  |                           |                                    |
  .                           .                                    .
  .                           .                                    .
  |                           |                                    |
  |                           |  FloorRequestStatus[GRANTED](BFCP) |
  |<---------------------------------------------------------------|
  |                           | 'Floor has been granted' (CtrlChn) |
  |                           |<-----------------------------------|
  |                           |                                    |
  |<==================== Bidirectional RTP stream ================>|
  |                           |                                    |
  .                           .                                    .
  .                           .                                    .

]]></artwork>
			</figure>	

<t>A UA, which also acts as a floor participant, sends a 
‘FloorRequest’ to the floor control server (FCS, which is 
collocated with the MS), stating his will to be granted the 
floor associated with the audio stream in the conference. The 
MS answers the UA with a ‘FloorRequestStatus’ message with a 
PENDING status, meaning that a decision upon the request has 
not been taken yet. It then notifies the AS, which in this 
example handles the floor chair role, about the new 
request by forwarding there the received request. The AS, 
according to the BFCP policies for this conference, takes a 
decision upon the request, i.e. accepting it. It informs the 
MS about its decision through a BFCP ‘ChairAction’ message. 
The MS then acknowledges the 'ChairAction' message 
and then notifies the UA about the decision with a new 
‘FloorRequestStatus’, this time with an ACCEPTED status 
in it. The ACCEPTED status of course only means that the 
request has been accepted, which doesn’t mean the floor has 
been granted yet. Once the queue management in the MS, 
according to the specified algorithms for scheduling, states 
that the floor request previously made by the UA can be granted, 
the MS sends a new ‘FloorRequestStatus’ to the UA with a 
GRANTED status, and takes care of unmuting the user in 
the conference. Once the UA receives the notification stating 
his request has been granted, he can start sending its media, 
aware of the fact that now his media stream won't be dropped 
by the MS. In case the session has been previously updated 
with a 'sendonly' associated to the media stream, the MS must 
originate a further re-INVITE stating that the media stream 
flow is now bidirectional ('sendrecv').</t>

<t>This scenario envisages an automated floor chair role, 
where it’s the AS, according to some policies, which takes 
decisions upon floor requests. Again, the case of a chair role 
impersonated by a real person is exactly the same, with the 
difference that the incoming request is not forwarded to the 
AS but to the floor control participant the chair UA is exploiting. 
The decision upon the request is communicated by means of a 
ChairAction message in the same way.</t>

<t>Another typical scenario is a BFCP-moderated conference 
with no chair managing floor requests. In such a scenario, 
the MS has to take care of incoming requests according to some 
predefined policies, e.g. always accepting new requests. 
In this case, no decisions are required by external entities, 
since all is instantly decided by means of policies in the MS.</t>

<t>As stated before, the case of the FCS co-located with the AS 
is much simpler to understand and exploit. When the AS has full 
control upon the FCS, including its queues management, the AS 
directly instructs the MS according to the floor status changes, 
e.g. by instructing the MS through the control channel to unmute 
a user who has been granted the floor associated to the audio 
media stream.</t>

</section>
<!-- Floor Control -->

</section>
<!-- Conferencing -->

<section title="Acknowledgments">

	<t>The authors would like to thank Spencer Dawkins for detailed
	reviews and comments, Gary Munson for suggestions, and Xiao Wang
	for review and feedback.</t>	

</section>
<!-- Acknowledgments -->

<section title="IANA Considerations">
	<t>This document has no actions for IANA.</t>

</section>
<!-- IANA Considerations -->

<section title="Security Considerations">
			
    <t>This document describes the architectural framework to be used for 
    media server control and focuses on the interactions between Application
    Servers and Media Servers. Interactions between end users and these
    servers is outside the scope of this document. </t>
    
    <t>Media Servers are valuable network resources and need to be
    protected against unauthorized access.
    Application Servers use SIP and related standards to establish both
    control channels to Media Servers, and to 
    establish media sessions between a MS and end users. Media servers use 
    the security mechanisms of SIP to authenticate requests from Application servers
    and to insure the integrity of those requests. Leveraging the security
    mechanisms of SIP insures that only authorized Application Servers are allowed to 
    establish sessions to a MS, and to access 
    MS resources through those sessions.</t>
    
    <t>Control channels between an AS and MS carry the MS control protocol
    which affects both the service seen by end users and the resources used
    on a media server. TLS <xref target="RFC4346"/> must be implemented as the
    transport-level security mechanism for control 
    channels to guarantee the integrity of MS control interactions. </t>
    
    <t>The resources of a MS can be shared by more than one AS. Media Servers
    must prevent one AS from accessing and manipulating the resources that 
    have been assigned to another AS. This may be achieved by an MS associating
    ownership of a resource to the AS that originally allocates it, and then
    insuring that future requests involving that resource correlate to
    the AS that owns and is responsible for it. </t>
	   
</section>
<!-- Security Consideration -->


<section title="Contributors">
			
    <t>This document is a product of the Media Control Architecture Design 
    Team. In addition to the editor, the following individuals comprised the 
    design team and made substantial textual contributions to this document:
	<list style="empty">
		<t>Chris Boulton: cboulton@ubiquity.net</t>
		<t>Martin Dolly: mdolly@att.com</t>
		<t>Roni Even: roni.even@polycom.co.il</t>
		<t>Lorenzo Miniero: lorenzo.miniero@unina.it</t>
		<t>Adnan Saleem: Adnan.Saleem@radisys.com</t>
    </list>
    </t>
    		
</section>
<!-- Contributors -->
	
</middle>
<!-- Middle -->

<back>
		
    <references title="Normative References">
        &rfc0793;
        &rfc3261;
        &rfc3264;        
        &rfc3550;
        &rfc3725;
        &rfc4145;
        &rfc4346;
        &rfc4566;
	&rfc5167;
	
    </references>	
    <!-- Normative References -->

    <references title="Informative References">

        &rfc4960;
        &rfc4585;
        &rfc4353;
        &rfc4583;
        &rfc4240;
        &w3c-vxml;
        &sip-ctrl-fw;        
        &w3c-xml;
        &rfc3263;
        &rfc3840;        
        &rfc2976;        
        &rfc4474;
        &rfc4575;
        &xcon-frmk;
        &rfc4579;
        &xcon-dm;
        &rfc4582;
        
    </references>	
    <!-- Informative References -->

  	</back>

	<!-- Back -->

</rfc>

PAFTECH AB 2003-20262026-04-23 16:27:04