One document matched: draft-ietf-mediactrl-architecture-03.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc0793 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.0793.xml'>
<!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
<!ENTITY rfc2976 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2976.xml'>
<!ENTITY rfc3261 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3261.xml'>
<!ENTITY rfc3262 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3262.xml'>
<!ENTITY rfc3263 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3263.xml'>
<!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml'>
<!ENTITY rfc3550 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml'>
<!ENTITY rfc3725 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3725.xml'>
<!ENTITY rfc3840 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3840.xml'>
<!ENTITY rfc4145 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4145.xml'>
<!ENTITY rfc4240 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4240.xml'>
<!ENTITY rfc4346 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4346.xml'>
<!ENTITY rfc4353 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4353.xml'>
<!ENTITY rfc4474 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4474.xml'>
<!ENTITY rfc4566 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml'>
<!ENTITY rfc4575 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4575.xml'>
<!ENTITY rfc4579 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4579.xml'>
<!ENTITY rfc4582 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4582.xml'>
<!ENTITY rfc4583 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4583.xml'>
<!ENTITY rfc4585 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4585.xml'>
<!ENTITY rfc4960 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4960.xml'>
<!ENTITY rfc5167 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5167.xml'>
<!ENTITY xcon-dm PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-xcon-common-data-model.xml'>
<!ENTITY xcon-frmk PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-xcon-framework.xml'>
<!ENTITY sip-ctrl-fw PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-mediactrl-sip-control-framework.xml'>
<!ENTITY w3c-vxml PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml4/reference.W3C.REC-voicexml20-20040316.xml'>
<!ENTITY w3c-xml PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml4/reference.W3C.REC-xml-20060816.xml'>
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<rfc docName="draft-ietf-mediactrl-architecture-03" ipr="full3978" category="info">
<front>
<title abbrev="Mediactrl Architecture">An Architectural Framework for Media
Server Control</title>
<author fullname="Tim Melanchuk" initials="T." surname="Melanchuk"
role="editor">
<organization>Rain Willow Communications</organization>
<address>
<email>tim.melanchuk@gmail.com</email>
</address>
</author>
<date year="2008"/>
<workgroup>MediaCtrl</workgroup>
<abstract>
<t>This document describes an Architectural Framework for Media Server
Control. The primary focus will be to define logical entities that
exist within the context of Media Server control,
and define the appropriate naming conventions and
interactions between them.
</t>
</abstract>
<!-- Abstract -->
</front>
<middle>
<section title="Introduction">
<t>Application Servers host
one or more instances of a communications application. Media servers
provide real time media processing functions. This documents presents
the core architectural framework to allow
Application Servers to control Media Servers.
An overview of the architecture describing the core logical entities and
their interactions is presented in <xref target="sec:arch-overview"/>.
The requirements for media server control are defined in
<xref target="RFC5167"/>.</t>
<t>SIP is used as the session establishment protocol within this
architecture. Application Servers use it both to
terminate media streams on Media Servers and to create and
manage control channels for media server control between themselves
and Media Servers. The detailed model for media server control
together with a description of SIP usage is presented in
<xref target="sec:SIP-usage"/>.</t>
<t>Several services are described using the framework defined in this document.
Use cases for IVR services are described in <xref target="sec:ivr"/>
and conferencing use cases are described in <xref target="sec:conferencing"/>.
</t>
</section>
<!-- Introduction -->
<section anchor="Terminology" title="Terminology">
<t>The following additional terms are defined for use in this document
in the context of Media Server control:
<list style="hanging">
<t hangText="Application Server (AS):">A functional entity that hosts one
or more instances of a communications application.</t>
<t hangText="Media Functions:">Functions available on a Media Server
that are used to supply media services to the AS. Some examples are
Dual-Tone Multi-Frequency (DTMF) detection, mixing,
transcoding, playing announcement, recording, etc.</t>
<t hangText="Media Resource Broker (MRB):">Assigns specific Media
Server resources to incoming calls at the request
of service applications (i.e., an AS), which happens in real time as
calls come into the network; may acquire knowledge of media server
resources utilization that it can use to
help decide which MS resources to assign to resource requests from
applications; and employs methods/algorithms to determine
MS resource assignment.</t>
<t hangText="Media Server (MS):">A functional entity whose
main task is to supply real time media related functions to communication
applications. In the architecture for the 3GPP IP Multimedia
Subsystem (IMS) a Media Server is referred
to as a Media Resource Function (MRF).</t>
<t hangText="Media Services:">Application service requiring media functions
such as Interactive Voice Response (IVR) or Media conferencing.</t>
<t hangText="Media Session:">From the Session Description Protocol
(SDP) specification <xref target="RFC4566"/>:
"A multimedia session is a set of
multimedia senders and receivers and the data streams flowing from senders
to receivers. A multimedia conference is an example of a multimedia
session."</t>
<t hangText="MS Control Channel:">A reliable transport connection between the
AS and MS used to exchange MS Control PDUs. Implementations must support the
Transport Control Protocol (TCP) <xref target="RFC0793"/>
and may support the Stream Control Transmission Protocol (SCTP)
<xref target="RFC4960"/>.
Implementations must support
TLS <xref target="RFC4346"/> as a transport-level security mechanism
although its use in deployments is optional.</t>
<t hangText="MS Control Dialog:">A SIP dialog that is used for establishing
a control channel between the UA and the MS.</t>
<t hangText="MS Control Protocol:">The protocol used for by an AS to control
a MS. The MS Control Protocol assumes a reliable underlying transport
protocol for the MS Control Channel.</t>
<t hangText="MS Media Dialog:">A SIP dialog between the AS and Media Server
that is used for establishing media sessions between a user device
such as a SIP phone and the Media Server.</t>
</list>
</t>
</section>
<!-- Terminology -->
<section anchor="sec:arch-overview" title="Architecture Overview">
<t>A Media Server (MS) is a network device that processes media streams.
Examples of media processing functionality may include:
<list style="symbols">
<t>Control of the Real-Time Protocol (RTP)
<xref target="RFC3550"/> streams such as
video fast update and flow control using Real-Time Control Protocol
(RTCP) feedback <xref target="RFC4585"/>.</t>
<t>Mixing of incoming media streams.</t>
<t>Media stream source (for multimedia announcements).</t>
<t>Media stream processing (e.g. transcoding, DTMF detection).</t>
<t>Media stream sink (for multimedia recordings)</t>
</list>
A MS supplies one or more media processing functionalities,
which may include others than those illustrated above, to an
Application Server (AS).
An AS is able to send a particular call to a
suitable MS, either through discovery of the capabilities that a
specific MS provides or through the use of a Media Resource Broker.
</t>
<t>The type of processing that a Media Server performs on media
streams is specified and controlled by an Application Server.
Application Servers are logical entities that are
capable of running one or more instances of a communications application.
Examples of Application Servers that may interact with a Media Server
are an AS acting as a Conference 'Focus' as defined in
<xref target="RFC4353"/>
or an IVR application using a Media Server to play announcements
and detect DTMF key presses. </t>
<t>Application servers use SIP to
establish control channels between themselves and MSs.
A MS Control Channel implements a reliable transport protocol
that is is used to carry the MS Control Protocol.
A SIP dialog used to establish a control channel
is referred to as a MS Control Dialog.</t>
<t>Application Servers terminate SIP <xref target="RFC3261"/>
signaling from SIP User Agents
and may terminate other signaling outside the scope of this document.
They use SIP Third Party Call Control
<xref target="RFC3725"/> (3PCC) to establish,
maintain, and tear down media streams from those SIP UAs
to a Media Server. A SIP dialog used by an AS to establish a media
session on an MS is referred to as a MS Media Dialog.</t>
<t>Media streams go directly between SIP User Agents and
Media Servers. Media Servers support multiple types of media. Common
supported media types include audio and video but others such as text and
the Binary Floor Control Protocol
(BFCP) <xref target="RFC4583"/> are also possible.
This basic architecture, showing session establishment signaling
between a single AS and MS is shown in <xref target="fig:basic-arch"/> below.</t>
<figure anchor="fig:basic-arch" title="Basic Signalling Architecture">
<artwork><![CDATA[
+-------------+ +--------------+
| | SIP (MS Control Dialog) | |
| Application |<----------------------->| Media |
| Server | | Server |
| |<----------------------->| |
+-------------+ SIP (MS Media Dialog) +--------------+
^ ^
\ | RTP/SRTP
\ | audio/
\ | video/etc)
\ |
\ v
\ +--------------+
\ SIP | |
+-------------->| SIP |
| User Agent |
| |
+--------------+
]]></artwork>
</figure>
<t>The architecture must support a many-to-many relationship
between Application Servers and Media Servers.
In real world deployments, an Application
Server may interact with multiple Media Servers and/or a Media Server
may be controlled by more than one Application Server.</t>
<t>Application Servers can use the SIP URI as described in
<xref target="RFC4240"/> to
request basic functions from Media Servers. Basic functions are
characterized as requiring no mid-call interactions between the
AS and MS. Examples of these functions are simple announcement
playing or basic conference mixing where the AS does not
need to explicitly control the mixing.</t>
<t>Most services however have interactions between the AS and MS
during a call or conference. The type of interactions can be generalized
as follows:
<list style="symbols">
<t>commands from an AS to an MS to request the application or
configuration of a function. The request may apply to a
single media stream, multiple media streams associated
with multiple SIP dialogs, or to properties of a conference mix.</t>
<t>responses from an MS to an AS reporting on the status
of particular command.</t>
<t>notifications from an MS to an AS that report results
from commands or notify changes to subscribed status.</t>
</list>
</t>
<t>Commands, responses, and notifications are transported using
one or more dedicated control channels between the Application
Server and the Media Server. Dedicated control channels provide
reliable, sequenced, peer to peer transport for media server control
interactions.
Implementations must support the
Transport Control Protocol (TCP) <xref target="RFC0793"/>
and may support the Stream Control Transmission Protocol (SCTP)
<xref target="RFC4960"/>.
Implementations must support
TLS <xref target="RFC4346"/> as a transport-level security mechanism
although its use in deployments is optional.
A dedicated control channel is shown
in <xref target="fig:ctrl-arch"/> below.</t>
<figure anchor="fig:ctrl-arch" title="Media Server Control Architecture">
<artwork><![CDATA[
+-------------+ +--------------+
| | | |
| Application | MS ctrl channel | Media |
| Server |<------------------->| Server |
| | | |
+-------------+ +--------------+
^ ^ ^
RTP/SRTP | | |
(audio/ | | |
video/etc) | | |
| | v
+---|-v-------+
+-|---v-------+ |
+-|-----------+ | |
| | | |
| SIP | | |
| User Agent | |-+
| |-+
+-------------+
]]></artwork>
</figure>
<t>Both Application Servers and Media Servers may interact with
other servers for specific purposes beyond the scope of this
document. For example Application Servers will often communicate
with other infrastructure components that are usually based on
deployment requirements with links to back-office
data stores and applications. Media Servers will often retrieve announcements
from external file servers. Also, many Media Servers support
IVR dialog services using VoiceXML
<xref target="W3C.REC-voicexml20-20040316"/>. In this case the MS interacts
with other servers using HTTP during standard VoiceXML processing.
VoiceXML Media Servers may also interact with speech engines,
for example using MRCPv2,
for speech recognition and generation purposes.</t>
<t>Some specific types of interactions between Application and
Media servers are also out of scope this document. MS resource
reservation is one such interaction. Also, any interactions between
Application Servers, or between Media Servers, are also out of
scope.</t>
</section>
<!-- Overview -->
<section anchor="sec:SIP-usage" title="SIP Usage">
<t>The Session Initiation Protocol (SIP) <xref target="RFC3261"/> was
developed by the IETF for the purposes of initiating, managing and
terminating multimedia sessions. The popularity of SIP has grown
dramatically since its inception and is now the primary Voice
over IP (VoIP) protocol. This includes being selected as the
basis for architectures such as the IP Multimedia Subsystem
(IMS) in 3GPP and included in many of the early live deployments
of VoIP related systems. Media servers are not a new concept in
IP telephony networks and there have been numerous
signaling protocols and techniques proposed for their control.
The most popular techniques to date have used a combination of SIP and
various markup languages to convey media service
requests and responses.</t>
<t>As discussed in <xref target="sec:arch-overview"/> and
illustrated in <xref target="fig:basic-arch"/>, the logical
architecture described by this document involves interactions between an
Application Server (AS) and a Media Server (MS). The SIP
interactions can be broken into ‘MS media dialogs’ – used between
an AS and a MS to establish media sessions between an endpoint
and a Media Server, and ‘MS control dialogs’ – which are used to
establish and maintain MS control channels.</t>
<t>SIP is the primary signaling
protocol for session signaling and is used for
all media sessions directed towards a Media Server as described in this document.
Media Servers may support other signaling
protocols but this type of interaction is not considered here.
Application Servers may terminate non-SIP signaling protocols
but must gateway those requests to SIP when interacting with
a Media Server.
</t>
<t>SIP will also be used for the creation, management and
termination of the dedicated MS control channel(s).
A control channel provides reliable
delivery of MS Control Protocol messages. The Application
and Media Servers use the SDP attributes defined in
<xref target="RFC4145"/>
to allow SIP negotiation of a transport connection.
Further details and example flows are provided in the SIP Control
Framework <xref target="I-D.ietf-mediactrl-sip-control-framework"/>.
The SIP Control Framework also includes basic control message
semantics corresponding to the types of interactions identified
in <xref target="sec:arch-overview"/>. It uses
the concept of "packages" to allow domain specific protocols
to be defined using the Extensible Markup Language (XML)
<xref target="W3C.REC-xml-20060816"/> format. The MS Control
Protocol is made up of one or more packages for
the SIP Control Framework.
</t>
<t>Using SIP for both media and control dialogs provides a
number of inherent benefits over other potential techniques.
These include:
<list style="numbers">
<t>The use of SIP location and rendezvous capabilities,
as defined in <xref target="RFC3263"/>. This provides
core mechanisms for routing a SIP request based on
techniques such as DNS SRV and NAPTR records. The SIP infrastructure
makes heavy use of such techniques.</t>
<t>The security and identity properties of SIP. For example,
using TLS for reliably and securely connecting to another
SIP based entity. The SIP protocol has a number of Identity
mechanisms that can be used. <xref target="RFC3261"/> provides an
intra-domain digest-based mechanism and
<xref target="RFC4474"/> defines a certificate
based inter-domain identity mechanism.
SIP with S/MIME provides the ability to secure payloads using
encrypted and signed certificate techniques.</t>
<t>SIP has extremely powerful and dynamic media negotiation
properties as defined in <xref target="RFC3261"/> and
<xref target="RFC3264"/>.
</t>
<t>The ability to select an appropriate SIP entity based
on capability sets as discussed in <xref target="RFC3840"/>.
This provides a powerful function that allows media servers to convey
a specific capability set. An AS is then free to select an
appropriate MS based on its requirements.</t>
<t>Using SIP also provides consistency with IETF protocols
and usages. SIP was intended to be used for the creation
and management of media sessions and this provides a correct
usage of the protocol.</t>
</list>
</t>
<t>As mentioned previously in this section, Media services
using SIP are fairly well understood. Some previous proposals
suggested using the SIP INFO <xref target="RFC2976"/> method
as the transport vehicle between the AS and MS.
Using SIP INFO in this way is not advised for a
number of reasons which include:
<list style="symbols">
<t>INFO is an opaque request with no specific semantics. A SIP endpoint
that receives an INFO request does not know what to do with it based on
SIP signaling.</t>
<t>SIP INFO was not created to carry generic session control
information along the signaling path and it should only
really be used for optional application information e.g.
carrying mid-call PSTN signaling messages between
PSTN gateways.</t>
<t>SIP INFO traverses the signaling path which is an inefficient use
for control messages which can be routed directly between the AS
and MS.</t>
<t><xref target="RFC3261"/> contains rules when using an un-reliable
protocol such as UDP. When a packet reaches a size close to
the Maximum Transmission Unit (MTU) the protocol should be
changed to TCP. This type of operation is not ideal when
constantly dealing with large payloads such as XML formatted
MS control messages.</t>
</list>
</t>
</section>
<!-- SIP Usage -->
<section anchor="sec:ivr" title="Media Control for IVR Services">
<t>One of the functions of a Media Server is to assist an
Application Server implementing IVR services by performing
media processing functions on media streams.
Although IVR is somewhat generic terminology, the scope
of media functions provided by a MS addresses the needs
for user interaction dialogs. These functions include media transcoding,
basic announcements, user input detection (via DTMF or speech) and
media recording.
</t>
<t>A particular IVR or user dialog application typically requires
the use of several specific media functions, as described above.
The range and complexity of IVR dialogs can vary significantly,
from a simple single announcement play-back to complex voice mail
applications.</t>
<t>As previously discussed, an AS uses SIP <xref target="RFC3261"/>
and SDP <xref target="RFC4566"/> to establish and configure media
sessions to a media server. An AS uses the MS control channel,
established using SIP, to invoke IVR requests and to receive
responses and notifications. This topology is shown in
<xref target="fig:ivr-arch"/> below.</t>
<figure anchor="fig:ivr-arch" title="IVR Topology">
<artwork><![CDATA[
+-------------+ SIP +-------------+
| Application |<---------------------------->| Media |
| Server | (media & MS Control dialogs) | Server |
| | | |
| | MS Control Protocol (IVR) | |
| |<---------------------------->| (IVR media |
| (App logic) | (CtrlChannel) | functions) |
+-------------+ +-------------+
^ ^^
\ || R
\ || T
\ || P
\ || /
\ || S
\ || R
\ || T
\ || P
\ vv
\ call signaling +-----------+
---------------------------->| UE |
(e.g. SIP) +-----------+
]]></artwork>
</figure>
<t>The variety in complexity of Application Server IVR services
requires support for different levels of media
functions from the Media Server as described in the following
sub-sections.</t>
<section anchor="sec:ivr-basic" title="Basic IVR Services">
<t>For simple basic announcement requests the MS control channel, as
depicted in <xref target="fig:ivr-arch"/> above, is not required.
Simple announcement requests may be invoked on the Media Server
using the SIP URI mechanism defined in <xref target="RFC4240"/>.
This interface allows no user input
digit detection and collection and no mid-call dialog
control. However, many applications only require basic media
services and the processing burden on the media server to support
more complex interactions with the AS would not be needed
in this case.</t>
</section>
<!-- Basic IVR Services -->
<section anchor="sec:ivr-mid-call" title="IVR Services with Mid-call Controls">
<t>For more complex IVR dialogs which require mid-call
interaction and control between the Application Server and the
Media Server, the MS control channel (as shown in
<xref target="fig:ivr-arch"/> above is used to invoke specific
media functions on the Media Server. These functions include,
but are not limited to, complex announcements with barge-in
facility, user input detection and reporting (e.g. DTMF) to an
Application Server, DTMF and speech activity controlled
recordings, etc. Composite services, such as play-collect
and play-record, are also addressed by this model.</t>
<t>Mid-call control also allows Application Servers to subscribe
to IVR related events and for the Media Server to notify these
events when they occur. Examples of such events are
announcement completion events, record completion events, and
reporting of collected DTMF digits.</t>
</section>
<!-- IVR Services with Mid-call Controls -->
<section anchor="sec:ivr-vxml" title="Advanced IVR Services">
<t>Although IVR Services with Mid-call Control,
as described above, provides a comprehensive set of media
functions expected from a Media Server, the Advanced IVR
Services model allows a higher level of abstraction describing application logic,
as provided by VoiceXML, to be executed on the Media Server.
Invocation of VoiceXML IVR dialogs may be via the ‘Prompt and Collect’
mechanism of <xref target="RFC4240"/>.
Additionally, VoiceXML dialog services may be invoked
over the MS control channel, as shown in <xref target="fig:ivr-arch"/>
above. VoiceXML IVR services invoked on the Media Server require an
HTTP interface between the Media Server and one or more back-end
servers that host or generate VoiceXML documents. These server(s)
may or may not be physically separate from
the Application Sever.</t>
</section>
<!-- Advanced Media Services -->
</section>
<!-- IVR -->
<section anchor="sec:conferencing" title="Media Control for Conferencing Services">
<t><xref target="RFC4353"/> describes the overall
architecture and protocol components needed for multipoint
conferencing using SIP. The framework
for centralized conferencing
<xref target="I-D.ietf-xcon-framework"/>
[draft-ietf-xcon-framework-08] extends
the framework to include a protocol between the user and the conferencing
server. <xref target="RFC4353"/> describes the conferencing server decomposition
but leaves the specifics open.</t>
<t>This section describes the decomposition and discusses the
functionality of the decomposed functional units. The conferencing
factory and the conference focus are part of the Application Server
described in this document.</t>
<t>An Application Server uses SIP Third Party Call Control
<xref target="RFC3725"/> to
establish media sessions from SIP user agents to a Media Server. The
same mechanism is used by the Application Server as described in
this section to add/remove participants to/from a conference,
as well as to handle the involved media streams set up on a per-user basis.
Since the XCON framework has been conceived as protocol-agnostic when
talking about the Call Signaling Protocol used by users to join a
conference, an XCON-compliant Application Server will have to take
care of gatewaying non-SIP signaling negotiations,
in order to set up and make available valid SIP media session between
itself and the Media Server, while still keeping the non-SIP
interaction with the user in a transparent way.</t>
<figure anchor="fig:conf-topology" title="Conference Topology">
<artwork><![CDATA[
+------------+ +------------+
| | SIP (2m+1c) | |
| Application|-------------| Media |
| Server | | Server |
| (Focus) |-------------| (Mixer) |
| | CtrlChannel | |
+------------+ +------------+
| \ .. .
| \\ RTP... .
| \\ .. .
| H.323 \\ ... .
SIP | \\ ... .RTP
| ..\ .
| ... \\ .
| ... \\ .
| .. \\ .
| ... \\ .
| .. \ .
+-----------+ +-----------+
|Participant| |Participant|
+-----------+ +-----------+
]]></artwork>
</figure>
<t>To complement the functionality provided by 3PCC and by XCON control
protocol, the Application Server makes use of a dedicated media server
control channel in order to set up and manage media conferences on the
media server. <xref target="fig:conf-topology"/> shows the
signaling and media paths for a two participant
conference. The three SIP dialogs between the AS and MS establish
two media sessions (2m) from participants, one originally signaled using
H.323 and then gatewayed into SIP and one signaled directly in SIP,
and one control session (1c).</t>
<t>As a conference focus, the Application Server is responsible for setting
up and managing a media conference on the media servers, in order to make
sure that the all media streams provided in a conference are available
to its participants. This is achieved by using the services of
one or more mixer entities, as described in RFC4353, whose role as
part of the Media Server is described in this section. Services
required by the Application Server include, but are not limited to,
means to set up, handle and destroy a new media conference,
adding and removing participants from a conference, managing media streams
in a conference, controlling the layout and the mixing configuration for each
involved media, allowing per-user custom media profiles and so on.</t>
<t>As a mixer entity, in such a multimedia conferencing scenario the Media
Server receives a set of media streams of the same type
(after transcoding if needed) and then takes
care of combining the received media in a type-specific manner,
redistributing the result to each authorized participant. The way
each media stream is combined, as well as the media-related policies,
is properly configured and handled by the Application Server by
means of a dedicated MS control channel.</t>
<t>To summarize the AS needs to be able to manage Media Servers at a
conference and participant level. </t>
<section anchor="sec:conf-create" title="Creating a New Conference">
<t>When a new conference is created, as a result of a previous
conference scheduling or of first participant dialing in
to a specified URI, the Application Server must take care of
appropriately creating a media conference on the Media Server.
It does so by sending an explicit request to the Media Server.
This can be by means of a MS control channel
message. This request may contain detailed information upon the desired
settings and policies for the conference (e.g. the media to involve,
the mixing configuration for them, relevant identifiers, etc.). The Media
Server validates such a request and takes care of allocating the needed
resources to set up the media conference.
</t>
<t>There is another way using SIP-based mechanisms such as <xref target="RFC4240"/> or
<xref target="RFC4579"/> using pre-defined conference profiles and then
using the MS control channel afterwards to control the conference if needed.
</t>
<t>Once done, the MS informs the Application Server about the result of the
request. Each conference will be referred to by a specific identifier,
which both the Application Server and the Media Server will include in
subsequent transactions related to the same conference (e.g. to modify
the settings of an extant conference).</t>
</section>
<!-- Conference Creation -->
<section anchor="sec:conf-adding" title="Adding a Participant To a Conference">
<t>As stated before, an Application Server uses SIP 3PCC
to establish media sessions from SIP user
agents to a Media Server. The URI that the AS uses in the INVITE
to the MS may be one associated with the conference on the MS.
More likely however, the media sessions are first established
to the media server using a URI for the media server and then
subsequently joined to the conference using the MS Control
Protocol. This allows IVR dialogs to be performed prior to
joining the conference.
</t>
<t>The AS as a 3PCC correlates the media session negotiation
between the UA and the MS, in order to
appropriately establish all the needed media streams based on the
conference policies.
</t>
</section>
<!-- Adding a Participant -->
<section anchor="sec:conf-media-ctrls" title="Media Controls">
<t>The XCON Common Data Model
<xref target="I-D.ietf-xcon-common-data-model"/>
currently defines some basic media-related controls, which
conference-aware participants can take advantage of in several
ways, e.g. by means of a XCON conference control protocol
or IVR dialogs. These controls
include the possibility to modify the participants' own volume for
audio in the conference, configure the desired layout for incoming
video streams, mute/unmute oneself and pause/unpause one's own video
stream. Such controls are exploited by conference-aware participants
through the use of dedicated conference control protocol requests
to the Application Server. The Application Server takes care of
validating such requests and translates them into the Media
Server Control Protocol, before forwarding them over the MS Control
Channel to the MS. According to the directives provided by the Application
Server, the Media Server manipulates the involved media streams
accordingly.</t>
<figure anchor="fig:conf-unmute-example" title="Conferencing Example: Unmuting A Participant">
<artwork><![CDATA[
+------------+ +------------+
| | 'Include audio | |
| Application| sent by user X | Media |
| Server | in conf Y mix' | Server |
| (Focus) |----------------->| (Mixer) |
| | (MS CtrlChn) | |
+------^-----+ +------------+
| ..
| ...
| 'Unmute me' ... RTP
| (XCON) ...
| ...
| ...
+-----------+ ...
|Participant|...
+-----------+
]]></artwork>
</figure>
<t>The media server may need to inform the AS of events like in-band
DTMF tones during the conference.</t>
</section>
<!-- Media Controls -->
<section anchor="sec:conf-floor-ctrl" title="Floor Control">
<t>The XCON framework introduces "floor control" functionality
as an enhancement upon <xref target="RFC4575"/>.
Floor control is a means to manage joint or exclusive access to
shared resources in a (multiparty) conferencing environment.
Floor control is
not a mandatory mechanism for a conferencing system implementation,
but it provides advanced media input control features
for conference-aware users. Such mechanism allows for a coordinated
and moderated access to any set of resources provided by the
conferencing system. To do so, a so-called floor is associated
to a set of resources, thus representing for users the right to
access and manipulate the related resources themselves. In order
to take advantage of the floor control functionality, a specific
protocol, the Binary Floor Control Protocol, has been specified
<xref target="RFC4582"/>. <xref target="RFC4583"/>
provides a way for SIP UAs to set up a BFCP
connection towards the Floor Control Server and exploit floor
control by means of a COMEDIA <xref target="RFC4145"/> negotiation.</t>
<t>In the context of the AS-MS interaction, floor control
constitutes a further means to control users' media streams. A
typical example is a floor associated with the right to access
the shared audio channel in a conference.
A user who is granted such a floor is granted by the
conferencing system the right to talk, which means that its
audio frames are included by the MS in the overall audio
conference mix. Similarly, when the floor is revoked the user
is muted in the conference, and its audio is excluded from
the final mix.</t>
<t>The BFCP defines a Floor Control Server (FCS) and the Floor
chair. It is clear that the floor chair making decisions about
floor requests is part of the application logic. This implies
that when the floor chair role in a conference is automated,
it will normally be part of the AS.</t>
<t>The example makes it clear that there can be a direct or
indirect interaction between the Floor Control Server and the
Media Server, in order to correctly bind each floor to its
related set of media resources. Besides, a similar interaction is
needed between the Floor Control Server and the Application
Server as well, since the latter must be aware of all the
associations between floors and resources, in order to opportunely
orchestrate the related bindings with the element responsible
for such resources (e.g. the Media Server when talking about
audio and/or video streams) and the operations upon them
(e.g. mute/unmute a user in a conference). For this reason,
the Floor Control Server can be co-located with either the
Media Server or the Application Server, as long as both
elements are allowed to interact with the Floor Control
Server by means of some kind of protocol.</t>
<t>In the following lines, both the approaches will be
described, in order to better explain the interactions
between the involved components in both the topologies.</t>
<t>When the AS and the FCS are colocated, the scenario is quite
straightforward. In fact it can be considered as a variation of the
case depicted in <xref target="fig:conf-unmute-example"/>. The only
relevant difference is that in this case the action the AS
commands on the control channel is triggered by a change in the
floor control status instead of a specific control requested
by a participant himself. The sequence diagram
in <xref target="fig:conf-bfcp-as"/>
describes the interaction between the involved parties in a typical
scenario. It assumes that a BFCP connection between the UA and the
FCS (which as we assume is colocated with the AS) has already
been negotiated and established, and that the UA has been
made aware of all the relevant identifiers and
floors-resources-associations (e.g. by means of <xref target="RFC4583"/>).
It also assumes that the AS has previously configured the media mixing
on the MS using the MS control channel. Every frame the UA might be
sending on the related media stream is currently being dropped
by the MS, since the UA still isn't authorized to use the
resource. For a SIP UA, this state could be consequent
to a 'sendonly' field associated to the media stream in
a re-INVITE originated by the MS. It is worth pointing out
that the AS has to make sure that no user-provided control
mechanism, e.g. the CCP mixing controls, can override the
floor control, when it is exploited.</t>
<figure anchor="fig:conf-bfcp-as" title="Conferencing Example: Floor Control Call Flow">
<artwork>< | |
|<------------------------------------| |
| |--+ apply |
| | | policies |
| |<-+ to request |
| | |
| FloorRequestStatus[ACCEPTED](BFCP) | |
|<------------------------------------| |
| | |
. . .
. . .
| | |
| FloorRequestStatus[GRANTED](BFCP) | |
|<------------------------------------| |
| | 'Unmute UA' (CtrlChn) |
| |------------------------->|
| | |
|<==================== Bidirectional RTP stream ================>|
| | |
. . .
. . .
]]></artwork>
</figure>
<t>A UA, which also acts as a floor participant, sends a
‘FloorRequest’ to the floor control server (FCS, which is
colocated with the AS), stating his will to be granted the
floor associated with the audio stream in the conference. The
AS answers the UA with a ‘FloorRequestStatus’ message with a
PENDING status, meaning that a decision upon the request has
not been taken yet. The AS, according to the BFCP policies for
this conference, takes a decision upon the request, i.e.
accepting it. Note that this decision might be relayed to
another participant in case he has previously been assigned
as chair of the floor. Assuming the request has been accepted,
the AS notifies the UA about the decision with a new
‘FloorRequestStatus’, this time with an ACCEPTED status
in it. The ACCEPTED status of course only means that the
request has been accepted, which doesn’t mean the floor has
been granted yet. Once the queue management in the FCS,
according to the specified algorithms for scheduling, states
that the floor request previously made by the UA can be granted,
the AS sends a new ‘FloorRequestStatus’ to the UA with a
GRANTED status, and takes care of unmuting the user in
the conference by sending a directive to the MS through the
control channel. Once the UA receives the notification stating
his request has been granted, he can start sending its media,
aware of the fact that now his media stream won't be dropped
by the MS. In case the session has been previously updated
with a 'sendonly' associated to the media stream, the MS must
originate a further re-INVITE stating that the media stream
flow is now bidirectional ('sendrecv').</t>
<t>As mentioned before, this scenario envisages an automated floor chair role,
where it’s the AS, according to some policies, which takes
decisions upon floor requests. The case of a chair role
impersonated by a real person is exactly the same, with the
difference that the incoming request is not directly handled
by the AS according to its policies, but it is instead forwarded
to the floor control participant the chair UA is exploiting.
The decision upon the request is then communicated by the chair
UA to the AS-FCS by means of a ChairAction message.</t>
<t>The rest of this section will instead explore the other
scenario, which assumes the interaction between
AS-FCS to happen through the MS control channel. This scenario
is compliant with the H.248.19 document related to conferencing
in 3GPP. The following sequence diagram describes the interaction
between the involved parties in the same use-case scenario that
has been explored for the previous topology: consequently, the diagram
makes exactly the same assumptions that have been made for
the previously described scenario. This means that it again assumes
that a BFCP connection between the UA and the FCS has already
been negotiated and established, and that the UA has been
made aware of all the relevant identifiers and
floors-resources-associations.
It also assumes that the AS has previously configured the media mixing
on the MS using the MS control channel. This time it includes
identifying the BFCP moderated resources, establishing
basic policies and instructions about chair identifiers for
each resource, and subscribing to events of interest, considering
the FCS is not colocated with the AS anymore. Additionally,
a BFCP session has been established between the AS (which in this
scenario acts as a floor chair), and the FCS (MS).
Every frame the UA might be sending on the related media stream
is currently being dropped
by the MS, since the UA still isn't authorized to use the
resource. For a SIP UA, this state could be consequent
to a 'sendonly' field associated to the media stream in
a re-INVITE originated by the MS. It is again worth pointing out
that the AS has to make sure that no user-provided control
mechanism, e.g. the CCP mixing controls, can override the
floor control, when it is exploited.</t>
<figure anchor="fig:conf-bfcp" title="Conferencing Example: Floor Control Call Flow">
<artwork>< |
|<---------------------------------------------------------------|
| | FloorRequestStatus[PENDING](BFCP) |
| |<-----------------------------------|
| | |
| | ChairAction[ACCEPTED] (BFCP) |
| |----------------------------------->|
| | ChairActionAck (BFCP) |
| |<-----------------------------------|
| | |
| | FloorRequestStatus[ACCEPTED](BFCP) |
|<---------------------------------------------------------------|
| | |
. . .
. . .
| | |
| | FloorRequestStatus[GRANTED](BFCP) |
|<---------------------------------------------------------------|
| | 'Floor has been granted' (CtrlChn) |
| |<-----------------------------------|
| | |
|<==================== Bidirectional RTP stream ================>|
| | |
. . .
. . .
]]></artwork>
</figure>
<t>A UA, which also acts as a floor participant, sends a
‘FloorRequest’ to the floor control server (FCS, which is
collocated with the MS), stating his will to be granted the
floor associated with the audio stream in the conference. The
MS answers the UA with a ‘FloorRequestStatus’ message with a
PENDING status, meaning that a decision upon the request has
not been taken yet. It then notifies the AS, which in this
example handles the floor chair role, about the new
request by forwarding there the received request. The AS,
according to the BFCP policies for this conference, takes a
decision upon the request, i.e. accepting it. It informs the
MS about its decision through a BFCP ‘ChairAction’ message.
The MS then acknowledges the 'ChairAction' message
and then notifies the UA about the decision with a new
‘FloorRequestStatus’, this time with an ACCEPTED status
in it. The ACCEPTED status of course only means that the
request has been accepted, which doesn’t mean the floor has
been granted yet. Once the queue management in the MS,
according to the specified algorithms for scheduling, states
that the floor request previously made by the UA can be granted,
the MS sends a new ‘FloorRequestStatus’ to the UA with a
GRANTED status, and takes care of unmuting the user in
the conference. Once the UA receives the notification stating
his request has been granted, he can start sending its media,
aware of the fact that now his media stream won't be dropped
by the MS. In case the session has been previously updated
with a 'sendonly' associated to the media stream, the MS must
originate a further re-INVITE stating that the media stream
flow is now bidirectional ('sendrecv').</t>
<t>This scenario envisages an automated floor chair role,
where it’s the AS, according to some policies, which takes
decisions upon floor requests. Again, the case of a chair role
impersonated by a real person is exactly the same, with the
difference that the incoming request is not forwarded to the
AS but to the floor control participant the chair UA is exploiting.
The decision upon the request is communicated by means of a
ChairAction message in the same way.</t>
<t>Another typical scenario is a BFCP-moderated conference
with no chair managing floor requests. In such a scenario,
the MS has to take care of incoming requests according to some
predefined policies, e.g. always accepting new requests.
In this case, no decisions are required by external entities,
since all is instantly decided by means of policies in the MS.</t>
<t>As stated before, the case of the FCS co-located with the AS
is much simpler to understand and exploit. When the AS has full
control upon the FCS, including its queues management, the AS
directly instructs the MS according to the floor status changes,
e.g. by instructing the MS through the control channel to unmute
a user who has been granted the floor associated to the audio
media stream.</t>
</section>
<!-- Floor Control -->
</section>
<!-- Conferencing -->
<section title="Acknowledgments">
<t>The authors would like to thank Spencer Dawkins for detailed
reviews and comments, Gary Munson for suggestions, and Xiao Wang
for review and feedback.</t>
</section>
<!-- Acknowledgments -->
<section title="IANA Considerations">
<t>This document has no actions for IANA.</t>
</section>
<!-- IANA Considerations -->
<section title="Security Considerations">
<t>This document describes the architectural framework to be used for
media server control and focuses on the interactions between Application
Servers and Media Servers. Interactions between end users and these
servers is outside the scope of this document. </t>
<t>Media Servers are valuable network resources and need to be
protected against unauthorized access.
Application Servers use SIP and related standards to establish both
control channels to Media Servers, and to
establish media sessions between a MS and end users. Media servers use
the security mechanisms of SIP to authenticate requests from Application servers
and to insure the integrity of those requests. Leveraging the security
mechanisms of SIP insures that only authorized Application Servers are allowed to
establish sessions to a MS, and to access
MS resources through those sessions.</t>
<t>Control channels between an AS and MS carry the MS control protocol
which affects both the service seen by end users and the resources used
on a media server. TLS <xref target="RFC4346"/> must be implemented as the
transport-level security mechanism for control
channels to guarantee the integrity of MS control interactions. </t>
<t>The resources of a MS can be shared by more than one AS. Media Servers
must prevent one AS from accessing and manipulating the resources that
have been assigned to another AS. This may be achieved by an MS associating
ownership of a resource to the AS that originally allocates it, and then
insuring that future requests involving that resource correlate to
the AS that owns and is responsible for it. </t>
</section>
<!-- Security Consideration -->
<section title="Contributors">
<t>This document is a product of the Media Control Architecture Design
Team. In addition to the editor, the following individuals comprised the
design team and made substantial textual contributions to this document:
<list style="empty">
<t>Chris Boulton: cboulton@ubiquity.net</t>
<t>Martin Dolly: mdolly@att.com</t>
<t>Roni Even: roni.even@polycom.co.il</t>
<t>Lorenzo Miniero: lorenzo.miniero@unina.it</t>
<t>Adnan Saleem: Adnan.Saleem@radisys.com</t>
</list>
</t>
</section>
<!-- Contributors -->
</middle>
<!-- Middle -->
<back>
<references title="Normative References">
&rfc0793;
&rfc3261;
&rfc3264;
&rfc3550;
&rfc3725;
&rfc4145;
&rfc4346;
&rfc4566;
&rfc5167;
</references>
<!-- Normative References -->
<references title="Informative References">
&rfc4960;
&rfc4585;
&rfc4353;
&rfc4583;
&rfc4240;
&w3c-vxml;
&sip-ctrl-fw;
&w3c-xml;
&rfc3263;
&rfc3840;
&rfc2976;
&rfc4474;
&rfc4575;
&xcon-frmk;
&rfc4579;
&xcon-dm;
&rfc4582;
</references>
<!-- Informative References -->
</back>
<!-- Back -->
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 16:27:04 |