One document matched: draft-deng-rtcweb-svccontrol-00.txt
Network Working Group Lingli Deng
Internet Draft Jin Peng
Intended status: Informational China Mobile
Expires: January 2013 July 3, 2012
Sender Media Control based on Local Status Detection
draft-deng-rtcweb-svccontrol-00.txt
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on January 3, 2009.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document.
Abstract
Peng, et al. Expires January 3, 2013 [Page 1]
Internet-Draft svc-control July 2012
This document proposes to add sender media control based on local
status detection by the browser, to further reduce multiparty video
conferencing's media bandwidth consumption with SVC cdoec for RTCWEB
use-cases.
Table of Contents
1. Introduction ................................................. 2
2. Problem Statement ............................................ 3
2.1. Existing Proposals in [use-case-draft] .................. 3
2.1.1. Receiver-harvested SVC-stream (SVC scheme).......... 3
2.1.2. Receiver-selected multi-stream (Multi-stream scheme) 4
2.1.3. Receiver-transcoded HD stream (transcoding scheme).. 4
2.2. Analysis of existing proposals .......................... 4
3. Enhanced SVC scheme with sender control ...................... 5
3.1. Overview of the core idea................................ 5
3.2. A simple example......................................... 5
3.3. Extensions to the simple example ....................... 11
3.4. Applicable Scenarios ................................... 12
4. Derived Requirements ........................................ 12
5. Security Considerations ..................................... 13
6. IANA Considerations ......................................... 13
7. References .................................................. 13
7.1. Normative References ................................... 13
7.2. Informative References ................................. 13
1. Introduction
As an important use-case of RTCWEB, multiparty video conferencing
has been discussed in depth in [use-case-draft], which states the
requirement for the receiving party (either a participant in the p2p
use-case or the centralized mixing server use-case) tend to render
the remote video input based on the audio input status (i.e. whether
the user is the spokesman of the conference at this moment) of the
sending party.
Assume an organization needs to establish a multiparty video
conference through a video communication system and hopes to be
connected point to point directly among the peers involved in the
process of the video conference session. Each conference participant
can send audio and video media stream to the other conference
participants. When one session participant receives the video and
audio stream from the others, he wants the conference spokesman to
be presented in a high-definition large window in the middle while
the other conference participants (including himself as local
playback) to a group of small windows by the side. The spokesman
Peng, et al. Expires January 3, 2013 [Page 2]
Internet-Draft svc-control July 2012
changes frequently from one to another with conducting of the
conference, while the participants always want to present the video
stream of the current conference spokesman in a high-definition
large window in the local browser and would like to present the
video stream of ordinary participants in small windows.
Taking into account the peer-to-peer video conferencing traffic
overhead and terminal traffic restrictions, how can we minimize the
cost of the video conferencing traffic overhead without affecting
the effects of the participants?presentation in video conference?
To satisfy the requirement that for each participant one high
resolution video is displayed in a large window, while a number of
low resolution videos are displayed in smaller windows, [use-case-
draft] provides three solutions:
1. The sender sends the high resolution SVC stream and the
receiver/server selects part or full presentation.
2. The sender sends both high resolution and low resolution stream,
and the receiver selects one to present.
3. The sender sends a high resolution stream, the server/receiver
transcodes into low/high resolution streams as required.
This proposal proposes to add sender media control based on the
determination on whether the local participant is speaking. For the
sender, high resolution media stream is only sent by the current
spokesman, and the ordinary media stream is sent by the other
participants. For the receiver, only to receive and present the HD
media stream from the spokesman while the ordinary media stream from
the others in the conference. The consumption of both the sender and
receiver thus is further reduced in this way.
2. Problem Statement
2.1. Existing Proposals in [use-case-draft]
[use-case-draft] provides three solutions to the window resizing
requirement for multiparty conferencing scenarios. Despite of their
differences elaborated in the following, they share a common nature
that they rely on the receiver to do the trick, while the sender
blindly offers HD media all the time.
2.1.1. Receiver-harvested SVC-stream (SVC scheme)
During a video conference, the sender uses the SVC coding, sends SVC
HD media streams (base layer plus extended layer) and the receiver /
Peng, et al. Expires January 3, 2013 [Page 3]
Internet-Draft svc-control July 2012
server selects the presentation based on the requirements. For
example, present the media stream from the current conference
spokesman in a high resolution full window and show part media
stream (only base layer) from ordinary participants in small windows.
2.1.2. Receiver-selected multi-stream (Multi-stream scheme)
During a video conference, the sender sends both high resolution and
low resolution media stream, the receiver choose one to present in
the local browser according to requirements. For example, the
receiver presents the high resolution media stream from the current
conference spokesman and shows low resolution media stream (only
base layer) from ordinary participants.
2.1.3. Receiver-transcoded HD stream (transcoding scheme)
During a video conference, the sender sends high resolution media
stream, and the server /receiver transcodes as requirement. For
example, the receiver transcodes the received media stream from
ordinary participants into low media stream and then present the
stream in local browser.
2.2. Analysis of existing proposals
We can analyze the existing three schemes mentioned above from the
two dimensions of the flow transmission overhead and local
processing consume.
Firstly, from the perspective of high resolution traffic
transmission consume, multi-stream seems the last thing to do. In
terms of the SVC and transcoding schemes, whether the user is
speaking, his local browser will send the high resolution media
stream collected by local devices. If the number of participants in
the video conference based on the style of peer-to-peer interaction
is N, there would be 2*(N-1) high resolution media stream resource
consume needed in the transmission (including: the number of high-
definition video media stream sent by the local participants is N-1;
the number of high-definition video media stream received by the
remote peers is N-1, too). While in the multi-stream scheme,
transmission overhead is raised for additional low resolution media
streams.
Secondly, we analyze their computation overheads from the
perspectives of the sender and receiver, respectively.
For the sender, the transcoding scheme incurs the lowest cost to
encoding the outgoing media stream. On the contrary, the multi-
Peng, et al. Expires January 3, 2013 [Page 4]
Internet-Draft svc-control July 2012
stream scheme, which dictates that two versions of the local media
stream be encoded and sent separately, incurs the highest cost. The
encoding cost for a local sender to perform SVC scheme is between
the above two.
For the receiver, the transcoding scheme needs to do real-time
transcoding and therefore takes the highest computation overheads to
the receiver side (mixing server or a participating peer). In order
to switch from and to HD timely, a receiver in multi-stream scheme
needs to synchronize both high and low resolution media stream with
considerable consumption. While SVC scheme adjusts the processing
consume by overlaying / removing the extension layer sub-stream data
with a minimum extra cost.
3. Enhanced SVC scheme with sender control
3.1. Overview of the core idea
The core idea of this proposal is:
o Both ends use SVC, negotiating with the other side to establish
peer-to-peer media stream.
o The sender's browser detects local user call status by calling
some local devices, such as to detect the sustained voice input.
o The browser adjusts sending policy of different coding levels
according to the results of the detected local status:
o If the local user is the spokesman, send base layer and
extended layer media stream.
o If the local user is in a quiescent state, just send the base
layer media stream.
o The receiver's browser has the ability to call the local devices,
and presents the decoded input media stream according to SVC.
3.2. A simple example
In the multi-party video conferencing session (For example, in the
illustrative example, three users participate the video
conferencing), the workflow as shown in the figure goes as follows:
Peng, et al. Expires January 3, 2013 [Page 5]
Internet-Draft svc-control July 2012
A B C
| | |
| | |
A/B/C peer-to- |<=============>|<=============>|
peer connection |<=============================>|
| | |
| | |
Detect the +--| Detect the +--| Detect the +--|
local user | | local user | | local user | |
status +->| status +->| status +->|
| | |
+------------+ +------------+ +------------+
| Continuous | | Slient | | Slient |
|speech input| | status | | status |
+------------+ +------------+ +------------+
| | |
| | |
| Base layer + | |
|extended layer | |
|-------------->| |
| | |
| Base layer + extended layer |
|------------------------------>|
| | Base layer |
Peng, et al. Expires January 3, 2013 [Page 6]
Internet-Draft svc-control July 2012
| |-------------->|
| | |
| Base layer | |
|<--------------| |
| | |
| | |
| | Base layer |
| |<--------------|
| | |
| Base layer |
|<------------------------------|
| | |
| | |
Detect the +--| Detect the +--| Detect the +--|
local user | | local user | | local user | |
status +->| status +->| status +->|
| | |
+------------+ +------------+ +------------+
| Continuous | | Slient | | Slient |
|speech input| | status | | status |
+------------+ +------------+ +------------+
| | |
|------------+ |------------+ |------------+
| Present A | | Present A | | Present A |
Peng, et al. Expires January 3, 2013 [Page 7]
Internet-Draft svc-control July 2012
|(Large Size)| |(Large Size)| |(Large Size)|
|------------+ |------------+ |------------+
|------------+ |------------+ |------------+
| Present B | | Present B | | Present B |
|(Small Size)| |(Small Size)| |(Small Size)|
|------------+ |------------+ |------------+
|------------+ |------------+ |------------+
| Present C | | Present C | | Present C |
|(Small Size)| |(Small Size)| |(Small Size)|
|------------+ |------------+ |------------+
| | |
| | |
Detect the +--| Detect the +--| Detect the +--|
local user | | local user | | local user | |
status +->| status +->| status +->|
| | |
+------------+ +------------+ +------------+
| Slient | | Continuous | | Slient |
| status | |speech input| | status |
+------------+ +------------+ +------------+
| | |
| | |
| Base layer | |
|-------------->| |
Peng, et al. Expires January 3, 2013 [Page 8]
Internet-Draft svc-control July 2012
| | |
| Base layer |
|------------------------------>|
| | |
| | |
| | Base layer + |
| |extended layer |
| |-------------->|
| | |
| Base layer + | |
|extended layer | |
|<--------------| |
| | |
| | |
| | Base layer |
| |<--------------|
| | |
| Base layer |
|<------------------------------|
| | |
| | |
Detect the +--| Detect the +--| Detect the +--|
local user | | local user | | local user | |
status +->| status +->| status +->|
Peng, et al. Expires January 3, 2013 [Page 9]
Internet-Draft svc-control July 2012
| | |
+------------+ +------------+ +------------+
| Slient | | Continuous | | Slient |
| status | |speech input| | status |
+------------+ +------------+ +------------+
| | |
|------------+ |------------+ |------------+
| Present A | | Present A | | Present A |
|(Small Size)| |(Small Size)| |(Small Size)|
|------------+ |------------+ |------------+
|------------+ |------------+ |------------+
| Present B | | Present B | | Present B |
|(Large Size)| |(Large Size)| |(Large Size)|
|------------+ |------------+ |------------+
|------------+ |------------+ |------------+
| Present C | | Present C | | Present C |
|(Small Size)| |(Small Size)| |(Small Size)|
|------------+ |------------+ |------------+
| | |
| | |
Figure 1 A B C Tripartite session flowchart
First of all, the session participant's browser of A, establishes a
peer-to-peer media stream connection with the other participants?
browsers with SVC codec. Subsequently, the browsers utilize local
device's capability to monitor the local user audio input status.
Since A is the spokesman at beginning, A's browser detects a
Peng, et al. Expires January 3, 2013 [Page 10]
Internet-Draft svc-control July 2012
continuous audio input from the local user, A's browser sends both
the base and extended layers of SVC sub-streams to the others(B and
C). At the meantime, B and C, as the listeners to A, remain silent.
Therefore, B's local browser sends only the SVC base layer media
stream to the others (A and C) according to the result of silent
status detected by calling the local devices. Similarly C sends only
the base layer to A and B.
The spokesman changes from one to another as the conference
continues. For example, when A detects himself in a silent status
from the local devices, A switches to send only the base layer to
the others (B and C). Conversely, when B detects a continuous audio
input from the local user, B would change to send both the base and
extended layers to the others (A and C). While C detects no changes
locally, C would continue sending base layer to A and B.
3.3. Extensions to the simple example
The above description of a sender controlled SVC transmission modes
based on local audio input status detection, is a simple example of
a more generalized sender media control scheme, where the types of
status transition as control triggers, the mechanism to enforce
media adjustment afterwards, and whether to give JS API the ability
to influence the media behavior, may vary accordingly. In particular,
a few extensions would be in terms of :
1. Options of implementation on send-side state detection, include:
a.to make use of the DTX module at voice codec level, and
decide the strategy of video mode switching in accordance
with a given strategy (Note that the strategy of video mode
switching (whether need for local high resolution
presentation) can be different from the strategy of voice
codec (silence / voice signals),
b.to defer user presence status indirectly, by monitoring
whether the local audio input is muted from the web page, or
whether the conferencing page is currently active, etc.
2. Options of implementation on send-side media control, include:
a.to adjust the definition of the local camera, or
b.to adjust the bitrate of a given codec in use (the layer
composition in case of SVC codecs, for instance), or
Peng, et al. Expires January 3, 2013 [Page 11]
Internet-Draft svc-control July 2012
c.to trigger a re-negotiation for a new codec instead of the
one in use.
3. Options of implementation on clients include:
a.to enforce the transition detection and conduct media control
singly by the local browser, or
b.the browser offers related callback APIs for both defined
state transition events and media controlling to JS, who may
provide SP's entailed control behavior.
3.4. Applicable Scenarios
For simplicity, we used the fully distributed scenario as an example
to elaborate our proposal. It should be noticed that the proposed
scheme is also applicable and would bring benefit to a centralized
mixer-based conferencing setting.
In a fully distributed P2P conference, without any centralized
mixing server at the media plane, each participating node processes
the modulation of video stream locally. As mentioned above, the
application of our proposal would reduce the transmission overhead
of both send-side nodes and receive-side nodes, also reduces the
processing and modulation overhead of the receive-side nodes.
While in a mixing server-based conference, the dedicated mixer
server receives the participants?local captured video stream, and
renders a uniform rendered collective screen for display. The
application of our proposal will significantly reduce the
transmission overhead for uploading local video to the server, thus
saving its bandwidth as a centralized sink.
4. Derived Requirements
In our proposal, video conferencing client detects the local user
session state (speaking/silence), achieving the purpose of saving
media plane transmission overhead by adjusting the sending rate of
video media stream. In order to realize it in a RTCWEB setting,
additional requirements are derived as follows:
1. Function Requirement for Browser
Fxx: The browser SHOULD be able to detect the audio input status
(speaking/ silent) of the local user.
2. API Requirements for Browser
Peng, et al. Expires January 3, 2013 [Page 12]
Internet-Draft svc-control July 2012
Axx: It SHOULD be possible for the JS to be notified about the audio
input status (speaking/silent) of the local user, and to entail the
media control behavior in response.
5. Security Considerations
TBA
6. IANA Considerations
None.
7. References
7.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[2] Crocker, D. and Overell, P.(Editors), "Augmented BNF for
Syntax Specifications: ABNF", RFC 2234, Internet Mail
Consortium and Demon Internet Ltd., November 1997.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2234] Crocker, D. and Overell, P.(Editors), "Augmented BNF for
Syntax Specifications: ABNF", RFC 2234, Internet Mail
Consortium and Demon Internet Ltd., November 1997.
7.2. Informative References
[use-case-draft] Holmberg, C., Hakansson, S. and Eriksson, G., "Web
Real-Time Communication Use-cases and Requirements",
draft-ietf-rtcweb-use-cases-and-requirements-09
(work in progress), June 27, 2012
Peng, et al. Expires January 3, 2013 [Page 13]
Internet-Draft svc-control July 2012
Authors' Addresses
Lingli Deng
China Mobile
Email: denglingli@chinamobile.com
Jin Peng
China Mobile
Email: pengjin@chinamobile.com
Peng, et al. Expires January 3, 2013 [Page 14]
| PAFTECH AB 2003-2026 | 2026-04-24 01:14:05 |