One document matched: draft-ietf-siprec-req-09.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
<!ENTITY rfc3261 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3261.xml">
<!ENTITY rfc4103 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4103.xml">
<!ENTITY rfc4733 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4733.xml">
]>
<rfc category="info"
ipr="trust200902"
docName="draft-ietf-siprec-req-09">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc strict="no" ?>
<front>
<title abbrev="Requirements for SIPREC">Use Cases and Requirements for SIP-based Media Recording (SIPREC)</title>
<author fullname="Ken Rehor" initials="K.R" role="editor" surname="Rehor">
<organization>Cisco Systems</organization>
<address>
<postal>
<street>170 West Tasman Dr.</street>
<street>Mail Stop SJC30/2/</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<email>krehor@cisco.com</email>
</address>
</author>
<author fullname="Leon Portman" initials="L.P" role="editor" surname="Portman">
<organization>NICE Systems</organization>
<address>
<postal>
<street>8 Hapnina</street>
<city>Ra'anana</city>
<code>43017</code>
<country>Israel</country>
</postal>
<email>leon.portman@nice.com</email>
</address>
</author>
<author fullname="Andrew Hutton" initials="A." surname="Hutton">
<organization>Siemens Enterprise Communications</organization>
<address>
<email>andrew.hutton@siemens-enterprise.com</email>
<uri>http://www.siemens-enterprise.com</uri>
</address>
</author>
<author fullname="Rajnish Jain" initials="R.J" surname="Jain">
<organization>IPC Systems</organization>
<address>
<postal>
<street>777 Commerce Drive</street>
<city>Fairfield</city>
<region>CT</region>
<code>06825</code>
<country>USA</country>
</postal>
<email>rajnish.jain@ipc.com</email>
</address>
</author>
<date month="March" day="31" year="2011"/>
<workgroup>SIPREC</workgroup>
<keyword>Internet-Draft</keyword>
<abstract>
<t>Session recording is a critical requirement in many business
communications environments such as call centers and financial
trading floors. In some of these environments, all calls must be
recorded for regulatory and compliance reasons. In others, calls may
be recorded for quality control or business analytics.</t>
<t>Recording is typically performed by sending a copy of the session media
to the recording devices. This document specifies requirements for
extensions to SIP that will manage delivery of RTP media to a recording device.
This is being referred to as SIP-based Media Recording.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>Session recording is a critical operational requirement in many
businesses, especially where voice is used as a medium for commerce
and customer support. A prime example where voice is used for trade
is the financial industry. The call recording requirements in this
industry are quite stringent. The recorded calls are used for
dispute resolution and compliance. Other businesses such as customer
support call centers typically employ call recording for quality
control or business analytics, with different requirements.</t>
<t>Depending on the country and its regulatory requirements, financial
trading floors typically must record all calls. In contrast, call centers typically only
record a subset of the calls, and calls must not fail regardless of
the availability of the recording device.</t>
<t>Respecting the privacy rights and wishes of users engaged in a call
is of paramount importance. In many jurisdictions participants have a right to
know that the session is being recorded or might be recorded, and have a
right to opt out, either by terminating the call or by demanding that the
call not be recorded. Therefore this document contains requirements for being
able to notify users that a call is being recorded and for users to be able
to request that a call not be recorded.
Use cases where users participating in a call are not informed that the
call is or might be recorded are outside the scope of this document.
In particular, lawful intercept is outside the scope of this document.
</t>
<t>Furthermore, the scale and cost burdens vary widely, in all markets,
where the different needs for solution capabilities such as media
injection, transcoding, and security-related needs do not conform
well to a one-size-fits-all model. If a standardized solution
supports all of the requirements from every recording market, but
doing so would be expensive for markets with lesser needs, then
proprietary solutions for those markets will continue to propagate.
Care must be taken, therefore, to make a standards-based solution
support optionality and flexibility.</t>
<t>This document specifies requirements for using SIP <xref target="RFC3261"/> between a
Session Recording Client and a Session Recording Server to control the
recording of media that has been transmitted in the context of a Communication Session.
A Communication Session is the "call" between participants.
The Session Recording Client is the source of the recorded media. The Session Recording Server is the sink of
recorded media. It should be noted that the requirements for the protocol between a
Session Recording Server and Session Recording Client have very similar requirements
(such as codec and transport negotiation, encryption key interchange,
firewall traversal) as compared to regular SIP media sessions. The
choice of SIP for session recording provides reuse of an existing
protocol. </t>
<t>The recorded sessions can be any RTP media sessions
including voice, DTMF (as defined by [RFC4733]), video, and text (as defined by [RFC4103]).</t>
<t>An archived session recording is typically comprised of the Communication Session media content and
the Communication Session Metadata. The Communication Session Metadata allows recording archives to be searched and
filtered at a later time and allows a session to be played back in a meaningful way,
e.g., with correct synchronization between the media.
The Communication Session Metadata needs to be conveyed from the Session Recording Client
to the Session Recording Server.</t>
<t>This document only considers active recording, where the Session Recording
Client purposefully streams media to a Session Recording Server. Passive
recording, where a recording device detects media directly from the
network, is outside the scope of this document.</t>
</section>
<section title="Requirements notation">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119] and indicate
requirement levels for compliant mechanisms.</t>
</section>
<section title="Definitions">
<t>Session Recording Server (SRS): A Session Recording Server (SRS) is a SIP User Agent (UA) that is a
specialized media server or collector that acts as the sink of the recorded media.
An SRS is typically implemented as a multi-port device that is capable
of receiving media from multiple sources simultaneously.
An SRS is the sink of the recorded session metadata.
</t>
<t>Session Recording Client (SRC): A Session Recording Client (SRC) is a SIP User Agent
(UA) that acts as the source of the recorded media, sending it to the SRS. An SRC is a logical
function. Its capabilities may be implemented across one or more
physical devices. In practice, an SRC could be a personal device
(such as a SIP phone), a SIP Media Gateway (MG), a Session Border
Controller (SBC) or a SIP Media Server (MS) integrated with an
Application Server (AS). This specification defines the term SRC such
that all such SIP entities can be generically addressed under one
definition. The SRC provides metadata to the SRS.</t>
<t>Communication Session (CS): A session created between two or more SIP User Agents (UAs)
that is the subject of recording.</t>
<t>Recording Session (RS): The SIP session created between an SRC and SRS for the
purpose of recording a Communication Session.</t>
<t>Figure 1 pictorially represents the relationship between a Recording Session and Communication Session.</t>
<figure anchor="recording_session">
<preamble></preamble>
<artwork><![CDATA[
+-------------+ +-----------+
| | Communication Session | |
| A |<------------------------------------>| B |
| | | |
+-------------+ +-----------+
..................................................................
. Session .
. Recording .
. Client .
..................................................................
|
| Recording
| Session
|
v
+------------+
| Session |
| Recording |
| Server |
+------------+
]]></artwork>
<postamble></postamble>
</figure>
<t>Metadata: Information that describes recorded media and the CS to which they relate.</t>
<t>Pause and Resume during a Communication Session:
Pause: The action of temporarily discontinuing the transmission and collection of RS media
Resume: The action of recommencing the transmission and collection of RS media
</t>
</section>
<section title="Use Cases">
<t>Use Case 1: Full-time Recording: One Recording Session for each Communication Session.</t>
<figure anchor="full_time">
<preamble>For example, the diagram below shows the lifecycle of Communication Sessions (CS)
and the relationship to the Recording Sessions (RS)</preamble>
<artwork><![CDATA[
CS |--- CS 1 ---| |--- CS 2 ---| |--- CS 3 ---|
RS |--- RS 1 ---| |--- RS 2 ---| |--- RS 3 ---|
]]></artwork>
<postamble></postamble>
</figure>
<t>Record every CS for specific extension/person.</t>
<t>The need to record all calls is typically due to business process purposes
(such as transaction confirmation or dispute resolution)
or to ensure compliance with governmental regulations.
Applications include enterprise, contact center, and financial trading floors.</t>
<t>Also commonly known as Total Recording.</t>
<t>Use Case 2: Selective Recording: Start a Recording Session when a
Communication Session to be recorded is established.</t>
<figure anchor="selective">
<preamble>In this example, Communication Sessions 1 and 3 are recorded but CS 2 is not.</preamble>
<artwork><![CDATA[
CS |--- CS 1 ---| |--- CS 2 ---| |--- CS 3 ---|
RS |--- RS 1----| |--- RS 2 ---|
]]></artwork>
<postamble></postamble>
</figure>
<t>Use Case 3: Start/Stop a Recording Session during a Communication Session.</t>
<t>The Recording Session starts during a Communication Session, either manually via a user-controlled mechanism
(e.g. button on user’s phone) or automatically via an application (e.g. a Contact Center customer service application)
or business event.
A Recording Session either ends during the Communication Session, or when the Communication Session ends.
One or more Recording Sessions may record each Communication Session.</t>
<figure anchor="dynamic">
<preamble></preamble>
<artwork><![CDATA[
CS |------------- Communication Session -----------|
RS |---- RS 1 ----| |---- RS 2 -----|
]]></artwork>
<postamble></postamble>
</figure>
<t>Use Case 4: Persistent Recording: A single Recording Session captures one or more Communication Sessions.
</t>
<figure anchor="persistent">
<preamble></preamble>
<artwork><![CDATA[
|--- CS 1 ---| |--- CS 2 ---| |--- CS 3 ---|
RS |---------------------- Recording Session ---------------------|
]]></artwork>
<postamble></postamble>
</figure>
<t>A Recording Session records continuously without interruption. Periods
when there is no CS in progress must be reproduced upon playback (e.g. by
recording silence during such periods or by not recording such periods but
marking them by means of metadata for utilization on playback, etc.).
Applications include financial trading desks and emergency (first-responder) service
bureaus. The length of a Persistent Recording Session is
independent from the length of the actual Communication Sessions.
Persistent Recording Sessions avoid issues such as media clipping
that can occur due to delays in Recording Session establishment.</t>
<t>The connection and attributes of media in the Recording
Session are not dynamically signaled for each Communication Session before
it can be recorded; however, codec re-negotiation is possible.</t>
<figure anchor="mixing">
<preamble>In some cases, more than one concurrent Communication Session
(on a single end-user apparatus, e.g. trading floor turret)
is mixed into one Recording Session:</preamble>
<artwork><![CDATA[
|-------- CS 1 -------|
|-------- CS 2 -------|
|-------- CS 3 -------|
RS |----------- Recording Session --------------|
]]></artwork>
<postamble></postamble>
</figure>
<t>Use Case 5: Real-time Recording Controls.</t>
<t>For an active Recording Session, privacy or security reasons may
demand not capturing a specific portion of a conversation.
An example is for PCI (payment card industry) compliance where credit
card info must be protected. One solution is to not record a caller
speaking their credit card information.</t>
<t>An example of a real-time controls is Pause/Resume.</t>
<t>Use Case 6: IVR / Voice Portal Recording.</t>
<t> Self-service Interactive Voice Response applications
may need to be recorded for application performance tuning or to meet
compliance requirements.</t>
<t>Metadata about an IVR session recording must include session information
and may include application context information (e.g. VoiceXML session variables, dialog names, etc.)</t>
<t>Use Case 7: Enterprise Mobility Recording.</t>
<t>Many agents and enterprise workers whose calls are to be recorded are not located on company premises.</t>
<t>Examples:</t>
<t>o Home-based agents or enterprise workers.</t>
<t>o Mobile phones of knowledge workers when they conduct work related
(and legally required recording) calls. e.g. insurance agents,
brokers, physicians.</t>
<t>Use Case 8: Geographically distributed or centralized recording.</t>
<t>Enterprises such as banks, insurance agencies, and retail stores may have many locations,
possibly up to thousands of small sites. Frequently only phones and network infrastructure are
installed in branches, without local recording services. In cases where calls inside or between
branches must be recorded, a centralized recording system in data centers together with
telephony infrastructure (e.g. PBX) me deployed.</t>
<t>Use Case 9: Record complex call scenarios.</t>
<t>The following is an example of a scenario where one call that is recorded
must be associated with a related call that also must be recorded.</t>
<t>o A Customer is in a conversation with a Customer Service Agent.</t>
<t>o Agent puts Customer on hold in order to consult with a Supervisor.</t>
<t>o Agent enters into a conversation with Supervisor.</t>
<t>o Agent disconnects from Supervisor, then reconnects with Customer.</t>
<t>o The Supervisor call must be associated with the original customer call.</t>
<t>Use case 10: High availability and continuous recording.</t>
<t>Specific deployment scenarios present different requirements for system availability,
error handling, etc. including:</t>
<t>o An SRS must always be available at call setup time.</t>
<t>o No loss of media recording, including during failure of an SRS.</t>
<t>o The Communication Session must be terminated (or suitable
notification given to parties) in the event of a recording failure.</t>
<t>Use Case 11: Record multi-channel, multi-media session.</t>
<t>Some applications require the recording of more than one media
stream, possibly of different types. Media are synchronized, either
at storage or at playback.</t>
<t>Speech analytics technologies (e.g. word spotting, emotion detection,
speaker identification) may require speaker-separated recordings for
optimum performance.</t>
<t>Multi-modal Contact Centers may include audio, video, IM or other
interaction modalities.</t>
<t>In trading floors environments, in order to minimize storage and recording system resources,
it may be preferable to mix multiple concurrent calls (Communication Sessions)
on different handsets/speakers on the same turret into single recording session.</t>
<t>Use Case 12: Real-time media processing.</t>
<t>It must be possible for an SRS to support real-time media processing, such as speech analytics
of trading floor interactions. Real-time analytics may be employed for automatic intervention
(stopping interaction or alerting) if for example, a trader is not following regulations.</t>
<t>Speaker separation is required in order to reliably detect who is saying specific phrases.</t>
</section>
<section title="Requirements">
<t>The following are requirements for SIP-based Media Recording:</t>
<t>o REQ-001 The mechanism MUST provide a means for using the SIP protocol for establishing,
maintaining and terminating Recording Sessions between a Session Recording Client and a
Session Recording Server.</t>
<t>o REQ-002 The mechanism MUST support the ability to record all CSs in their entirety.</t>
<t>o REQ-003 The mechanism MUST support the ability to record selected CSs in their entirety, according to policy.</t>
<t>o REQ-004 The mechanism MUST support the ability to record selected parts of selected CSs.</t>
<t>o REQ-005 The mechanism MUST support the ability to record a CS without loss of media
(for example, clipping media at the beginning of the CS) due to RS recording preparation and also,
without impacting the quality or timing of the CS (for example, delaying the start of the CS while
preparation for recording session). See Use Case 4 in Section 4 for more details.</t>
<t>o REQ-006 The mechanism MUST support the recording of IVR sessions.</t>
<t>o REQ-007 The mechanism MUST support the recording of RTP media types voice,
DTMF (as defined by [RFC4733]), video, and text (as defined by [RFC4103]).</t>
<t>o REQ-008 The mechanism MUST support the ability for an SRC to deliver mixed audio
streams from multiple Communication Sessions to an SRS.</t>
<t>Note: A mixed audio stream is where several related Communication Sessions
are carried in a single Recording Session. A mixed media stream is
typically produced by a mixer function.
The RS MAY be informed about the composition of the mixed streams through session metadata.</t>
<t>o REQ-009: The mechanism MUST support the ability for an SRC to deliver
mixed audio streams from different parties of a given Communication Session to an SRS.</t>
<t>o REQ-010 The mechanism MUST support the ability to deliver to the SRS multiple media streams for a given CS.</t>
<t>o REQ-011 The mechanism MUST support the ability to pause and resume the
transmission and collection of RS media.</t>
<t>o REQ-012 The mechanism MUST include a means for providing the SRS with metadata describing CSs
that are being recorded, including the media being used and the identities of
parties involved.</t>
<t>o REQ-013 The mechanism MUST include a means for the SRS to be able to correlate
RS media with CS participant media.</t>
<t>o REQ-014 Metadata format must be agnostic of the transport protocol.</t>
<t>o REQ-015: The mechanism MUST support a means to stop the recording.</t>
<t>o REQ-016: The mechanism MUST support a means for a recording-aware UA involved
in a CS to request at session establishment time that the CS should be recorded
or should not be recorded, the honoring of such a request being dependent on policy.</t>
<t>o REQ-017: The mechanism MUST support a means for a recording-aware UA involved
in a CS to request during a session that the recording of the CS should be started,
paused, resumed or stopped, the honoring of such a request being dependent on policy.</t>
<t>o REQ-018 The mechanism MUST NOT prevent the application of tones or announcements during
recording or at the start of a CS to support notification to participants that the call is
being recorded or may be recorded.</t>
<t>o REQ-019 The mechanism MUST provide a means of indicating to recording-aware UAs whether
recording is taking place, for appropriate rendering at the user interface.</t>
<t>o REQ-020 The mechanism MUST provide a way for metadata to be conveyed to the SRS incrementally during the CS.</t>
<t>o REQ-021 The mechanism MUST NOT prevent high availability deployments.</t>
<t>o REQ-022 The mechanism MUST include a means for the SRS to receive the starting wall clock time for
each RS media stream corresponding to the CS participant media.</t>
<t>o REQ-023 The mechanism MUST include a means for the SRS to receive the wall clock time when the
Recording Session is paused and resumed.</t>
<t>o REQ-024 The mechanism MUST support functionality such that if the CS is encrypted,
the RS may use the same or different encryption keys.</t>
<t>o REQ-025 The mechanism MUST provide means for an SRS to authenticate the SRC on RS initiation.</t>
<t>o REQ-026 The mechanism MUST provide means for an SRC to authenticate the SRS on RS initiation.</t>
<t>o REQ-027 The mechanism MUST include a means for ensuring that the integrity of the metadata sent from SRC to SRS is an accurate representation of the original CS metadata.</t>
<t>o REQ-028 The mechanism MUST include a means for ensuring that the integrity of the media sent from SRC to SRS is an accurate representation of the original CS media.</t>
<t>o REQ-029 The mechanism MUST include a means for ensuring the confidentiality of the Metadata sent from SRC to SRS.</t>
<t>o REQ-030 The mechanism MUST provide a means to support RS confidentiality.</t>
<t>o REQ-031 The mechanism MUST support the ability to deliver to the SRS multiple media streams
of the same media type (e.g. audio, video). For example in the case of delivering unmixed audio for
each participant in the CS.</t>
</section>
<section title="Privacy Considerations">
<t>Respecting the privacy rights and wishes of users engaged in a call
is of paramount importance. In many jurisdictions participants have a right to
know that the session is being recorded or might be recorded, and have a
right to opt out, either by terminating the call or by demanding that the
call not be recorded. Therefore this document contains requirements for being
able to notify users that a call is being recorded and for users to be able
to request that a call not be recorded.
Use cases where users participating in a call are not informed that the
call is or might be recorded are outside the scope of this document.
In particular, lawful intercept is outside the scope of this document.</t>
<t>
Requirements for participant notification of recording
vary widely by jurisdiction. In a given deployment, not all users will be
authorized to stop the recording
of a CS (although any user can terminate its participation in a CS). Typically users within the
domain that is carrying out the recording will be subject to policies of
that domain concerning whether CSs are recorded. For example, in a call centre,
agents will be subject to policies of the call centre and may or may not have
the right to prevent the recording of a CS or part of a CS. Users calling into
the call centre, on the other hand, will typically have to ask the agent not
to record the CS. If the agent is unable to prevent recording, or if the caller
does not trust the agent, the only option generally is to terminate the CS.</t>
<t>Privacy considerations also extend to what happens to a recording once it
has been created. Typical issues are who can access the recording (e.g.,
receive a copy of the recording, view the metadata, play back the media, etc.),
for what purpose the recording can be used (e.g., for non-repudiation, for
training purposes, for quality control purposes, etc.) and for how long the
recording is to be retained before deletion. These are typically policies of
the domain that makes the recording, rather than policies of individual users
involved in a recorded CS, whether those users be in the same domain or in a
different domain. Taking the call centre example again, agents might be made
aware of call centre policy regarding retention and use of recordings as part
of their employment contract, and callers from outside the call centre might
be given some information about policy when notified that a CS will be recorded
(e.g., through an announcement that says that calls may be recorded for quality purposes).</t>
<t>This document does not specify any requirements for a user engaged in
a CS to be able to dictate policy for what happens to a recording, or
for such information to be conveyed from an SRC to an SRS. It is
assumed that the SRS has access to policy applicable to its
environment and can ensure that recordings are stored and used in
accordance with that policy.
</t>
</section>
<section title="Security Considerations">
<t>Session recording has substantial security implications, for the
SIP UA's being recorded, the SRC, and the SRS.</t>
<t>For the SIP UA's involved in the Communication Session, the requirements
in this draft enable the UA to identify that a Communication Session is being
recorded and for the UA to request that a given Communication Session is not
subject to recording.</t>
<t>Since humans don't typically look at or know about protocol signaling
such as SIP, and indeed the SIP session might have originated through
a PSTN Gateway without any ability to pass on in-signaling
indications of recording, users can be notified of recording in the
media itself through voice announcements, a visual indicator on
the endpoint, or other means.</t>
<t>With regards to security implications of the protocol(s), clearly
there is a need for authentication, authorization, eavesdropping
protection, and non-repudiation for the solution. The SRC needs to
know the SRS it is communicating with is legitimate, and vice-versa,
even if they are in different domains. Both the signaling and media
for the Recording Session need the ability to be authenticated and protected from
eavesdropping and non-repudiation. Requirements are
detailed in the requirements section.</t>
<t>Communication Sessions and Recording Sessions can require different security levels
both for signaling and media, depending on deployment configurations. For some environments,
for example, SRS and SRC will be collocated in a secure network region and therefore the RS
will not require the same protection level as a CS that extends over a public network,
for example. For other environments, the SRS can be located in a public cloud, for example,
and the RS will require a higher protection level than the CS. For these reasons, there
is not a direct relationship between the security level of Communication Sessions and
the security level of Recording Sessions.</t>
</section>
<section title="IANA Considerations">
<t>This document has no IANA actions.</t>
</section>
<section title="Acknowledgements">
<t>Thanks to Dan Wing, Alan Johnson, Vijay Gurbani,
Cullen Jennings, Hadriel Kaplan, Henry Lum, Dave Smith, Martin Palmer, Alissa Cooper,
Deepanshu Gautam, Paul Kyzivat, Parthasarathi R, Ram Mohan R, and Charles Eckel
for their significant contributions and assistance with this document and Working Group,
and to all the members of the DISPATCH WG and SIPREC WG mailing lists for
providing valuable input to this work.</t>
</section>
</middle>
<back>
<references title='Normative References'>
&rfc2119;
&rfc3261;
&rfc4103;
&rfc4733;
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 18:58:18 |