One document matched: draft-greevenbosch-mmusic-sdp-parallax-00.xml
<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'[
<!ENTITY RFC2119 SYSTEM
"/IETF/xml2rfc-1.35/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC4234 SYSTEM
"/IETF/xml2rfc-1.35/bibxml/reference.RFC.4234.xml">
<!ENTITY RFC4396 SYSTEM
"/IETF/xml2rfc-1.35/bibxml/reference.RFC.4396.xml">
<!ENTITY RFC4566 SYSTEM
"/IETF/xml2rfc-1.35/bibxml/reference.RFC.4566.xml">
]>
<?rfc toc="yes"?>
<?rfc symrefs="yes" ?>
<rfc category="std" ipr="trust200902" docName="draft-greevenbosch-mmusic-sdp-parallax-00">
<front>
<title>SDP attribute to signal parallax</title>
<author initials="B." surname="Greevenbosch" fullname="Bert Greevenbosch">
<organization abbrev="Huawei Technologies">Huawei Technologies Co., Ltd.</organization>
<address>
<postal>
<street>Huawei Industrial Base</street>
<street>Bantian, Longgang District</street>
<city>Shenzhen</city>
<code>518129</code>
<country>P.R. China</country>
</postal>
<phone>+86-755-28978088</phone>
<email>bert.greevenbosch@huawei.com</email>
</address>
</author>
<author initials="Y." surname="Hui" fullname="Hui Yu">
<organization abbrev="Huawei Technologies">Huawei Technologies Co., Ltd.</organization>
<address>
<postal>
<street>Huawei Nanjing R&D Center</street>
<street>101 Software Avenue</street>
<street>Yuhuatai District</street>
<city>Nanjing</city>
<code>210012</code>
<country>P.R. China</country>
</postal>
<phone>+86-25-56620323</phone>
<email>huiyu@huawei.com</email>
</address>
</author>
<date month="April" year="2012"/>
<area>Real-time Applications and Infrastructure</area>
<workgroup>mmusic</workgroup>
<abstract>
<t>
This document introduces a "ParallaxInfo" attribute in SDP.
This attribute can be used in stereoscopic applications,
to signal the depth positioning of 2D media data within the 3D space.
</t>
</abstract>
<note title="Note">
<t>
Discussion and suggestions for improvement are requested,
and should be sent to mmusic@ietf.org.
</t>
</note>
</front>
<middle>
<section title="Introduction">
<t>
To see a 3D scene,
the human brain interprets two different views as perceived by the left and right eyes,
and fuses these views into a single 3D perception.
The depth of the object is perceived by interpreting the horizontal shift of that object between the views.
This shift is called "parallax".
</t>
<t>
In stereoscopic 3D multimedia applications,
there are various ways to transmit media streams in 3D.
One way is to transmit two different streams, one for the left eye and one for the right eye.
These streams are then projected to the relevant eyes using the appropriate technology.
</t>
<t>
When sending text streams in 3D,
the solution mentioned above would imply sending the same text stream twice.
Since the two streams would only differ in horizontal positioning,
this introduces a lot of unnecessary overhead.
</t>
<t>
This document specifies a "ParallaxInfo" attribute in SDP
<xref target="RFC4566"/>,
which is used to transfer the parallax information.
It eliminates the need to send two different streams separately,
as they can be calculated from a single stream and the "ParallaxInfo" attribute value.
</t>
<t>
The attribute transfers this information as two parameters:
one indicating which view (left/right/centre) is carried by the stream,
and another to signal the parallax between the objects.
</t>
<t>
The attribute is not restricted for signalling the parallax of text streams,
but it can also be used to place other 2D objects in the 3D space.
Examples include a channel logo,
an electronic programme guide and on-screen display of an audio volume indicator.
</t>
</section>
<section title="Requirements notation">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" in this document are to be interpreted as
described in <xref target="RFC2119"/>.
</t>
</section>
<section title="Definitions">
<t>
<list style="hanging">
<t hangText="L-view"><vspace/>
A visual entity that is to be projected to the left eye.
In the case of video,
the L-view is a video frame designated for the left eye.
In the case of text,
the L-view is the text positioned for viewing by the left eye.
</t>
<t hangText="L-stream"><vspace/>
A sequence of L-views,
which is streamed to the device.
</t>
<t hangText="R-view"><vspace/>
A visual entity that is to be projected to the right eye.
In the case of video,
the R-view is a video frame designated for the right eye.
In the case of text,
the R-view is the text positioned for viewing by the right eye.
</t>
<t hangText="R-stream"><vspace/>
A sequence of R-views,
which is streamed to the device.
</t>
<t hangText="C-view"><vspace/>
The centre view: a visual entity as seen from a viewpoint between the left and right eyes.
The C-view can be used to calculate the L- and R-views.
</t>
<t hangText="C-stream"><vspace/>
A sequence of C-views,
which is streamed to the device.
</t>
<t hangText="stereoscopic device"><vspace/>
A device that is able to produce and display different images for the left and right eyes,
such that the viewer can experience a 3D view.
</t>
<t hangText="2D device"><vspace/>
A device that is not able to produce and display different images for the left and right eyes.
</t>
<t hangText="2D media stream"><vspace/>
A sequence of two dimensional visual entities (such as text or 2D graphics),
which is streamed to the device.
</t>
</list>
</t>
</section>
<section title="The ParallaxInfo attribute">
<t>
The SDP attribute "ParallaxInfo" is used to transmit the depth positioning of 2D media data,
such as a text stream,
in the 3D space.
</t>
<t>
The attribute has the following syntax:
<figure title="">
<artwork>
a=ParallaxInfo:<transmitted position> <parallax>
</artwork>
</figure>
The <transmitted position> indicates whether the L-, C- or R-stream is transmitted,
whereas <parallax> indicates the parallax (i.e. shift) between corresponding L- and R-views in pixels,
normalised to a screen width of 11520 pixels.
To convert the value of <parallax> to match the display video screen width W,
it has to be divided by a factor F=11520/W.
</t>
<t>
The <transmitted position> can have the following values:
<list style="hanging">
<t hangText=""L"">
indicates that the transmitted stream is the L-stream.
A stereoscopic device MUST calculate the corresponding R-views by shifting the L-views <parallax>/F pixels towards the right.
</t>
<t hangText=""C"">
indicates that the transmitted stream is the C-stream.
A stereoscopic device MUST calculate the corresponding L-views by shifting the C-views <parallax>/(2*F) pixels towards the left,
and the R-views by shifting the C-views <parallax>/(2*F) pixels towards the right.
<parallax> SHOULD be even.
If it is odd, the divided value MUST be rounded off towards zero.
</t>
<t hangText=""R"">
indicates that the transmitted stream is the R-stream.
A stereoscopic device MUST calculate the corresponding L-views by shifting the R-views <parallax>/F pixels towards the left.
</t>
</list>
</t>
<t>
<parallax> MAY be negative.
In this case, the shift direction is reversed.
</t>
<t>
The "ParallaxInfo" attribute can be a session-level attribute or a media-level attribute.
As a session-level attribute,
it specifies the default parallax value which can be applied to all the 2D media streams in the session being described.
As a media-level attribute,
it specifies the parallax value which can be applied to the associated 2D media stream,
overriding any session-level parallax value specified.
</t>
<t>
The stereoscopic device MAY use the session-level attribute value for on-screen display,
for example an audio volume indicaton,
channel indication or electronic programme guide.
</t>
<t>
Notice that a 2D device that does not support the "ParallaxInfo" attribute will ignore it,
and therefore display the 2D media data on the position as transmitted.
</t>
</section>
<section title="Example">
<t>
The following is an example of an SDP description of a session with
an audio stream, a video stream and a 3GPP timed text stream (see <xref target="3gpp-tt"/>),
streamed using RTP as per <xref target="RFC4396"/>.
If the display resolution is 1280x720,
the parallax is scaled down with a factor F=11952/1280=9.
The transmitted text stream is the L-stream,
and with the example display resolution each R-view is 144/9=16 pixels on the left of the L-view.
This corresponds to the virtual positioning of the text in front of the screen.
</t>
<figure title="">
<artwork>
v=0
o=Alice 2890844526 2890842807 IN IP4 131.163.72.4
s=The technology of 3D-TV
c=IN IP4 131.164.74.2
t=0 0
a=ParallaxInfo:L -180
m=video 49170 RTP/AVP 99
a=rtpmap:99 H264/90000
m=video 52888 RTP/AVP 97
a=rtpmap:97 3gpp-tt/1000
a=ParallaxInfo:L -144
m=audio 52890 RTP/AVP 10
a=rtpmap:10 L16/16000/2
</artwork>
</figure>
<t>
Notice that the default value "-180" is overridden by the value "-144" for the text stream.
However the "-180" value is still signalled for on-screen display of e.g. volume control and other 2D graphics.
</t>
<t>
In case each R-view is 24 pixels (216 pixels in the reference resolution) on the right of the associated L-view,
i.e. the virtual positioning of the text is behind the screen,
then the three lines defined for 3gpp-tt can be replaced as follows:
<figure title="">
<artwork>
m=video 52888 RTP/AVP 97
a=rtpmap:97 3gpp-tt/1000
a=ParallaxInfo:L 216
</artwork>
</figure>
</t>
</section>
<section title="Remarks">
<t>
A positive parallax value indicates virtual positioning of the 2D media data behind the screen.
This is the case when the objects in the L-view are on the left of the same objects in the R-view.
Similarly,
a negative parallax value indicates that the objects in the R-view are on the left of the same objects in the L-view,
and corresponds to virtual positioning in front of the screen.
</t>
<t>
Since the "ParallaxInfo" attribute indicates a shift of the transmitted stream,
it might happen that the L- or R-view trespasses the bounderies of the display.
In this case, clipping is necessary, as illustrated by <xref target="3Dpic"/>.
</t>
<t>
Similarly,
there are areas which are covered by the L-view but not by the R-view and vice versa.
These areas need to be filled in a sensible way.
Since the "ParallaxInfo" attribute is designed for objects that overlay other video data
(e.g. subtitles),
it is trivial to fill in uncovered areas by using the underlying video data.
However, if there is no underlying video data,
other mechanisms to fill in the uncovered areas need to be defined.
Definition of these mechanisms are out of scope of this document.
<figure anchor="3Dpic">
<artwork>
+-----------------------+ +-----------------------+
|SCREEN . | |SCREEN . |
| . | | . |
| +-------------------+ | +-------------------+|
| |TEXT . | | |TEXT . ||
| +-------------------+ | +-------------------+|
| . | | . |
| . | | . |
+-----------------------+ +-----------------------+
|
| CLIP
V
+-----------------------+ +-----------------------+
|SCREEN . | |SCREEN . |
| . | | . |
| +------------------+ | +-------------------+|
| |TEXT . | | |TEXT . ||
| +------------------+ | +-------------------+|
| . | | . |
| . | | . |
+-----------------------+ +-----------------------+
</artwork>
</figure>
</t>
<t>
The normalisation of the parallax towards a reference screen width of 11520 pixels
was chosen to ensure correct display of the same stream on different video display resolutions.
This is especially useful when the underlying video is coded with a scalable video codec,
which does not have a fixed video resolution.
</t>
</section>
<section title="Security Considerations">
<t>
The authors foresee no security issues in addition to those already listed in <xref target="RFC4566"/>.
</t>
</section>
<section title="IANA Considerations">
<t>
Following the guidelines in <xref target="RFC4566"/>,
the SDP attribute has to be registered at IANA:
</t>
<t>
<list style="symbols">
<t>
Contact name/email: authors of this RFC
</t>
<t>
Attribute name: ParallaxInfo
</t>
<t>
Long-form attribute name: Parallax info for the depth positioning of 2D media data in the 3D space
</t>
<t>
Type of attribute: session level and media level
</t>
<t>
Subject to charset: no
</t>
</list>
</t>
<t>
This attribute is used to signal how 2D media data is to be displayed in the 3D space.
It indicates the shift of the respective left and right views.
</t>
<t>
The attribute has the following ABNF (see <xref target="RFC4234"/>) description:
<figure title="">
<artwork>
ParalaxInfo = "a=ParallaxInfo:" TransmittedPosition Parallax
TransmittedPosition = "C"/"L"/"R"
Parallax = num-val
</artwork>
</figure>
</t>
<t>
The attribute transfers this information as two parameters:
"TransmittedPosition" indicates which view of the 2D media data
(left "L", right "R" or centre "C")
is carried by the stream,
and "Parallax" signals the parallax (in pixels) of objects in the 2D media stream.
</t>
</section>
</middle>
<back>
<references title="Normative References">
&RFC2119;
&RFC4234;
&RFC4396;
&RFC4566;
<reference anchor="3gpp-tt">
<front>
<title>Transparent end-to-end packet switched streaming service (PSS); Protocols and codecs (Release 5)</title>
<author>
<organization>3GPP</organization>
</author>
<date month="December" year="2002"/>
</front>
<seriesInfo name="TS" value="26.234 v5.3.0"/>
</reference>
</references>
</back>
</rfc>| PAFTECH AB 2003-2026 | 2026-04-24 03:03:18 |