One document matched: draft-zong-httpstreaming-gap-analysis-01.txt
Differences from draft-zong-httpstreaming-gap-analysis-00.txt
Network Working Group N. Zong
Internet-Draft Huawei Technologies
Intended status: Informational October 24, 2010
Expires: April 27, 2011
Survey and Gap Analysis for HTTP Streaming Standards and Implementations
draft-zong-httpstreaming-gap-analysis-01
Abstract
With the explosive growth of the Internet usage and increasing demand
for multimedia information on the web, media delivery over Internet
attract substantial attention from media industry. To meet above
requirements, HTTP Streaming technology is designed and gradually
plays an important role in recent years. Several leading Standard
Development Organizations (SDOs) have been producing a series of
technical specifications to define streaming over HTTP. Moreover,
several companies have devoted to developing private HTTP-based media
delivery platform to provide high quality, adaptive viewing
experience to customers. Following a brief survey of existing HTTP
streaming standards and implementations, this document gives a brief
summary on these related work, analyzes the potential challenges
especially from the network point of view, and lists the gap between
existing work and possible working scope on the topic of HTTP
streaming in IETF.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 27, 2011.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
Zong Expires April 27, 2011 [Page 1]
Internet-Draft Survey and Gap Analysis October 2010
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Zong Expires April 27, 2011 [Page 2]
Internet-Draft Survey and Gap Analysis October 2010
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. HTTP Streaming Standards . . . . . . . . . . . . . . . . . . . 6
3.1. 3GPP . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1. Media Presentation Components . . . . . . . . . . . . 6
3.1.2. Media Presentation Description . . . . . . . . . . . . 8
3.1.3. Streaming Procedure . . . . . . . . . . . . . . . . . 9
3.1.3.1. Overview . . . . . . . . . . . . . . . . . . . . . 9
3.1.3.2. Segment list generation . . . . . . . . . . . . . 10
3.1.3.3. Seeking, trick mode and adaptation support . . . . 10
3.2. OIPF . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1. MPD . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.2. Segmentation . . . . . . . . . . . . . . . . . . . . . 11
3.2.3. Media formats for MPEG2-TS . . . . . . . . . . . . . . 11
3.2.4. Use cases . . . . . . . . . . . . . . . . . . . . . . 12
3.2.4.1. Live streaming . . . . . . . . . . . . . . . . . . 12
3.2.4.2. Trick mode and seeking . . . . . . . . . . . . . . 12
3.3. MPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1. Objectives . . . . . . . . . . . . . . . . . . . . . . 13
3.3.2. Requirements for proposal . . . . . . . . . . . . . . 13
4. HTTP Streaming Implementations . . . . . . . . . . . . . . . . 14
4.1. Microsoft Smooth Streaming . . . . . . . . . . . . . . . . 14
4.1.1. On-disk MP4 file format . . . . . . . . . . . . . . . 15
4.1.2. On-wire segments transmission . . . . . . . . . . . . 15
4.1.3. Adaptative support . . . . . . . . . . . . . . . . . . 16
4.2. Adobe . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.1. Components . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2. Workflow . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.3. Top features . . . . . . . . . . . . . . . . . . . . . 17
4.3. Apple . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.1. Basic process . . . . . . . . . . . . . . . . . . . . 18
5. Gap Analysys . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.1. Brief Summary of Exitsting Work . . . . . . . . . . . . . 19
5.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3. Gap List and Potential Working Scope in IETF . . . . . . . 21
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22
7. Security Considerations . . . . . . . . . . . . . . . . . . . 22
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
9.1. Normative References . . . . . . . . . . . . . . . . . . . 22
9.2. Informative References . . . . . . . . . . . . . . . . . . 22
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 23
Zong Expires April 27, 2011 [Page 3]
Internet-Draft Survey and Gap Analysis October 2010
1. Introduction
Media streaming have played increasingly important role in Internet
content deliveries, and are becoming indispensable in many
applications (e.g., distance learning, digital libraries, home
shopping, and video-on-demand). Currently, several streaming
protocols are commonly used to deliver media content on Internet,
such as HTTP, RTSP/RTP, RTMP, MMS, etc.
HTTP streaming, one of above listed protocols, is rapidly becoming
one of the most commonly used approach for media content distribution
on the Internet. HTTP streaming is a mechanism for sending media
data/file, which is divided into several chunks/fragments and supply
them in order to user through port 80/8080. HTTP streaming includes
various streaming media formats/codec including MP4, MPEG2-TS, H.264/
AAC, etc., and streaming services over HTTP, such as Windows Media/
Silver Light Streaming, Flash Video, QuickTime Streaming Server, Real
Media Streaming and others.
HTTP streaming offers two advantages as below:
1) Media protocols often have difficulty getting around firewalls and
routers because they are commonly based on UDP sockets over unusual
port numbers. HTTP-based media delivery has no such problems because
firewalls and routers know to pass HTTP downloads through port 80.
2) HTTP media delivery has the ability to use standard HTTP servers
and standard HTTP caches (or cheap servers in general) to deliver the
content, so that it doesn't require special proxies or caches.
Additionally, most Content Delivery Network (CDN) make use of HTTP to
redirect request, retrieve cached multimedia object, and communicate
policy servers.
Several leading Standard Development Organizations (SDOs) have been
producing a series of technical specifications to define streaming
over HTTP. 3GPP introduces adaptive HTTP streaming in Technical
Specification (TS) 26.234 [3GPP], where HTTP streaming is introduced
in detail including Media Presentation Description (MPD), Media
Segmentation Format, HTTP server and client behavior, etc., as an
alternative approach to the RTSP/RTP based media delivery. Open IPTV
Forum (OIPF) introduces HTTP adaptive streaming in its technical
Specification [OIPF], which defines the usage of and extensions to
3GPP HTTP streaming to enable HTTP based Adaptive Streaming for OIPF
compliant services and devices. Recently, ISO/IEC JTC1/SC29/WG11
(MPEG) launched a new standard on HTTP streaming. A bunch of
documents [MPEG-1][MPEG-2][MPEG-3][MPEG-4] have been proposed to
address the backgroud, objectives, use cases and requriements of the
transport of MPEG media over HTTP.
Zong Expires April 27, 2011 [Page 4]
Internet-Draft Survey and Gap Analysis October 2010
Several companies have devoted to developing private HTTP-based media
delivery platform to provide high quality, adaptive viewing
experience to customers. Microsoft has implemented its Smooth
Streaming technology, which is a web-base, adaptive media content
delivery approach that uses standard HTTP [MS-IIS]. Instead of
delivering media as full-file download, in Smooth Streaming, the
content is delivered to client as a series of small file chunks that
can be easily cached at edge servers, closer to client. Adobe HTTP
Dynamic Streaming is a new Adobe-defined delivery method for enabling
on-demand and live adaptive bitrate video streaming over regular HTTP
connections [Adobe]. Adobe HTTP Dynamic Streaming packages media
files into fragments that Flash Player clients can access instantly
without downloading the entire file. Apple HTTP Live Streaming
[Apple] allows to send live or prerecorded audio and video to iPhone
or other devices, such as desktop computers, using an ordinary Web
server, with support of adaptive bitrate.
Following a brief survey of the above mentioned existing HTTP
streaming standards and implementations, this document gives a brief
summary on these related work, analyzes the potential challenges
especially from the network point of view, and lists the gap between
existing work and possible working scope on the topic of HTTP
streaming in IETF.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119] and
indicate requirement levels for compliant implementations.
Live Streaming: Live events can be streamed over the Internet with
the help of broadcast software which encodes the live source - from a
microphone, video camera, or other recording device and delivers the
resulting stream to the server. The server then transfers the
stream. So the user experiences the event as it happens.
On-Demand Streaming: To provide "anytime" access to media content,
client is allowed to select and playback on demand.
Progressive Download: A mode that allow client playback the media
file while the file is downloading, after only a few seconds wait for
buffering, the process of collecting the first part of a media file
before playing.
Adaptive Streaming: Adaptive streaming is a process that adjusts the
quality of a video delivered to a client based on the changing
Zong Expires April 27, 2011 [Page 5]
Internet-Draft Survey and Gap Analysis October 2010
network conditions to ensure the best possible viewer experience.
3. HTTP Streaming Standards
3.1. 3GPP
3GPP introduces adaptive HTTP streaming in Technical Specification
(TS) 26.234 [3GPP]. TS 26.234 specifies the protocols and codecs for
the Packet-Switched Steaming Service (PSS) within the 3GPP system.
Protocols for control signalling, capability exchange, media
transport, rate adaptation and protection are specified. Codecs for
speech, natural and synthetic audio, video, still images, bitmap
graphics, vector graphics, timed text and text are specified.
The delivery of media over HTTP provides an alternative delivery
mechanism to the RTSP/RTP based media delivery. It is assumed that
the HTTP-Streaming Client has access to a Media Presentation
Description (MPD). An MPD provides sufficient information for the
HTTP-Streaming Client to provide a streaming service to the user by
sequentially downloading media data from an HTTP server and rendering
the included media appropriately.
3.1.1. Media Presentation Components
A media presentation is a structured collection of data that is
accessible to the HTTP-Streaming Client, which is described in a MPD.
The media presentation structure is shown in the following figure.
Zong Expires April 27, 2011 [Page 6]
Internet-Draft Survey and Gap Analysis October 2010
^ resultion / bit-rate / language / etc
|
| representation
| +------------------------------------------------+
| | segment segment |
| | +----------------------------+ +-----+ |
| | | +----------+ +----------+ | | | |
| | | | meta | | media | | | | |
| | | | data | | data | | | | ... ...|
| | | +----------+ +----------+ | | | |
| | +----------------------------+ +-----+ |
| +------------------------------------------------+
|
| +------------------------------------------------+
| | representation |
| +------------------------------------------------+
| ... ...
| +------------------------------------------------+
| | representation |
| +------------------------------------------------+
| period 1 period2 ...
+------------------------------------------------------------------->
time
A media presentation consists of:
1) A sequence of Periods.
2) Each Period contains one or more Representations from the same
media content. Different Representations usually have different
attributes on media resolution, bit-rate, language, etc.
3) Each Representation consists of one or more segments.
4) Segments contain media data and/or metadata to decode and
present the included media data and is defined as a unit that can
be uniquely referenced by an http-URL element in the MPD. The
Initialisation Segment contains initialisation information (no
media data) for accessing the Representation. A Media Segment
contains media data that are described either within this Media
Segment or by the Initialisation Segment. The segment has a start
time relative to the start time of the representation (period)
such that the client can download a specific segment. The segment
provides random access information, namely if and how you can
randomly access the media within this segment. There is no
requirement that a segment starts with a random access point
(RAP). But it is possible that all segments start with a RAP.
Zong Expires April 27, 2011 [Page 7]
Internet-Draft Survey and Gap Analysis October 2010
3.1.2. Media Presentation Description
The logic structure of media presentation is actually described as
the data structure (e.g. xml schema) in MPD file. That is, the MDP
contains metadata required by the client to construct appropriate
URIs to access segments and to provide the streaming service to the
user. Several important attributes and elements contained in a MPD
are listed as below:
1) "type" attribute: type of the media presentation, i.e. VOD or
live.
2) "availabilityStartTime" attribute: media presentation start
time if "type"=live. If "type"=VoD, media presentation start time
is 0.
3) "duration" attribute: duration/length of the media
presentation. For live presentation, the sum of "duration" and
"availabilityStart" specifies the end time of the media
presentation. If "duration" is not provided, then the MPD does
not describe an entire media presentation and the MPD may be
updated during live presentation.
4) "minimumUpdatePeriodMPD" attribute: minimum MPD update period.
5) "timeShiftBufferDepth" attribute: duration of time shifting
buffer maintained at the server for live presentation. This
attribute will be used in the case of trick mode.
6) "minBufferTime" attribute: minimum buffer time for the stream.
7) Multiple "Period" element: describe a period. A "Period"
element contains the following important attributes and elements:
7.1) "start" attribute: start time of this period.
7.2) Multiple "Representation" element: describe a
representation with different bit-rate, resolution, language,
etc. A "Representation" element contains the following
important attributes and elements:
7.2.1) "bandwidth" attribute: maximum bit-rate of the
representation averaged over any interval of "minBufferTime"
duration.
7.2.2) "startWithRAP" attribute: When True, indicates that
all segments in the representation start with a random
access point (RAP).
Zong Expires April 27, 2011 [Page 8]
Internet-Draft Survey and Gap Analysis October 2010
7.2.3) "qualityRanking" attribute: quality ranking of the
representation.
7.2.4) "TrickMode" element: provides the information for
trick mode. In this element, "AlternatePlayoutRate"
attribute denotes the playout speed as a multiple of the
regular playout speed.
7.2.5) "SegmentInfo" element: describe all segments in a
representation. Each "SegmentInfo" element permits
generating a list of Media Segment URLs (possibly with a
byte range) and Media Segment start times relative to the
start time of the Representation. A "SegmentInfo" element
contains the following important attributes and elements:
7.2.5.1) "duration" attribute: gives the constant
approximate segment duration.
7.2.5.2) at most one "InitialisationSegmentURL" element.
If not present, then each media segment within this
representation shall be self-initialising.
7.2.5.3) either a URLtemplate" element that specifies a
default segment URL template for all segments, or one or
more "Url" elements that provides a set of explicit
URL(s) for segments.
Note that a client derives the request-for-MPD-update time as the sum
of the time of its last requested update of the MPD and the
"minimumUpdatePeriodMPD" attribute.
3.1.3. Streaming Procedure
3.1.3.1. Overview
Initially, the client parses the MPD and creates an segment list for
each representation. Then the client selects one representation
based on the information in the representation attributes and other
information, e.g. available bandwidth, client capabilities. Client
acquires initialisation segments and the media segments of the
selected representation by using the generated segment list. Client
continues consuming the media content by continuously requesting
media segments taking into account the MPD update. Client may change
representations taking into account updated MPD information and/or
updated information from its environment, e.g. access bit-rate
changes.
Zong Expires April 27, 2011 [Page 9]
Internet-Draft Survey and Gap Analysis October 2010
3.1.3.2. Segment list generation
A list contains: 1) URL to initialization segment; 2) URLs to media
segments; 3) start times to media segments in the period. There are
two approaches for generating segment list. One is template based
generation, that is to utilize the "URLtemplate" and "duration"
attributes in "SegmentInfo" element in MPD. Another is play-list
based generation, that is to utilize the "URLs" and "duration" in
"SegmentInfo" element in MPD.
3.1.3.3. Seeking, trick mode and adaptation support
Suppose that the client wants to seek to time "tp", the corresponding
segment can be searched by the server through: Target_segment_index =
max { i | MediaSegment[i].StartTime <= tp- Period.start }. For
accurate seeking to time "tp", client needs to access a RAP. Client
may use the information in the 'sidx' to locate the RAP and the
corresponding presentation time in the media presentation. For fast
start-up, client may initially request the 'sidx' box from the
beginning of the media segment using byte range requests.
Trick mode can be implemented by utilizing the "AlternatePlayoutRate"
attribute in "TrickMode" element in MPD.
Switching to a new representation is equivalent to seeking to the new
representation. Client should seek to a RAP in the new
representation at a desired presentation time "tp" later than current
presentation time.
3.2. OIPF
Open IPTV Forum (OIPF) introduces HTTP adaptive streaming in its
technical Specification [OIPF]. This specification defines the usage
of and, where necessary, extensions to the technologies defined in
3GPP TS 26.234 to enable HTTP based Adaptive Streaming for Release 2
OIPF compliant services and devices. Most details on HTTP adaptive
streaming in this specification is based on 3GPP TS 26.234. The
extensions and designs speficic to OIPF are introduced in this
document.
3.2.1. MPD
A Representation may be made up of multiple components, for example
audio, video. A partial Representation may only contain some of
these components and a terminal may need to download (and play)
multiple partial Representations to build up a complete
Representation, with the appropriate components according to the
preferences and wishes of the user. Accordingly, in MPD, the
Zong Expires April 27, 2011 [Page 10]
Internet-Draft Survey and Gap Analysis October 2010
"Representation" element may consist of one or more Components which
may be downloaded and provided to the terminal in addition to content
being downloaded from other "Representation" elements. In this case
the "Representation" element in MPD SHALL contain one or more
"Component" elements.
The "Representation" element in MPD may carry a "group" attribute.
The value of the "group" attribute SHALL be the same for
Representations that contain at least one same Component. Two
Representations with completely different Components (e.g. audio at
two different languages) SHALL have different values for the "group"
attribute.
To provide nPVR functionality, when the Segments of the live Content
are stored on the nPVR server, the URLs indicating the Segments on
the nPVR server SHOULD be provided to the OIPF to enable it to access
these Segments by the MPD update mechanism defined in 3GPP TS 26.234.
3.2.2. Segmentation
Each Segment SHALL start with a random access point (RAP). Moreover,
to enable seamless switching:
1) Different Component Streams of the same Component SHALL be
encoded in the same media format but MAY be different in the
profile of that format. (e.g., if a Representation contains a
Component Stream of a certain video Component that is encoded
using H.264/AVC using the HD profile, then all other
Representations that have a Component Stream of that Component
must use H.264/AVC but may use different configurations within the
HD profile.)
2) Segments of Representations with the same value for the "group"
attribute SHALL be time aligned.
3.2.3. Media formats for MPEG2-TS
Component Streams of the same Component (e.g. "video angle 1 in H.264
at 720x576" and "video angle 1 in H.264 at 320x288") SHALL be carried
in transport stream packets that have the same PID. When the
Segments of a Representation contain MPEG-2 TS packets, the value of
the "id" attribute in each Component element, if present, SHALL be
the PID of the Transport Stream packets which carry the Component.
For all Representations, the PAT and PMT are either contained in the
initialisation Segments or in the media Segments. The
Representations with zero "group" attribute will have the same PAT/
PMT as Representations with non-zero "group" attribute.
Zong Expires April 27, 2011 [Page 11]
Internet-Draft Survey and Gap Analysis October 2010
A media Segment SHALL contain the concatenation of one or several
contiguous (and complete) PES packets which are split and
encapsulated into TS packets. When packetizing video elementary
streams, up to one frame SHALL be included into one PES packet. The
PES packet where a frame starts SHALL always contain a PTS/DTS header
fields in the PES header.
3.2.4. Use cases
3.2.4.1. Live streaming
If the "timeShiftBufferDepth" attribute is present in the MPD, it may
be used by the terminal to know at any moment which Segments are
effectively available for downloading with the current MPD. If this
timeshift information is not present in the MPD, the terminal may
assume that all Segments described in the MPD which are already in
the past are available for downloading. Periods may be used in the
live streaming scenario to appropriately describe successive live
events with different encoding or adaptive streaming properties.
3.2.4.2. Trick mode and seeking
Basic implementation of trick modes is based on the processing of
Segments by the terminal software: downloaded Segments may be
provided to the decoder at a speed lower or higher than normal. The
playback of Segments in fast forward and fast rewind has an immediate
effect on the bitrate, because the Segments also need to be
downloaded at a faster rate than normal. Dedicated streams may be
used to implement efficient trick modes: it is recommended to produce
the streams with a lower frame rate, longer Segments or a lower
resolution to ensure that the bitrate is kept at a reasonable level
even when the Segment is downloaded at a faster rate. The dedicated
stream is described as Representation with a "TrickMode" element in
the MPD. It is also recommended that if there are dedicated fast
forward Representations, the normal Representations do not contain
the "TrickMode" element in the MPD.
To determine the random access point in a media Segment, the client
should download and search RAP one by one till the required RAP is
found.
3.3. MPEG
Recently, ISO/IEC JTC1/SC29/WG11 (MPEG) launched a new standard on
HTTP streaming. A series of proposals
[MPEG-1][MPEG-2][MPEG-3][MPEG-4] have been proposed to address the
backgroud, objectives, use cases and requriements of the transport of
MPEG media over HTTP, as well as call-for-propsal on this topic.
Zong Expires April 27, 2011 [Page 12]
Internet-Draft Survey and Gap Analysis October 2010
3.3.1. Objectives
The main objectives of this new standard are:
1) Efficient delivery of MPEG media over HTTP in an adaptive,
progressive, download/streaming fashion.
2) Support of live streaming of multimedia content.
3) Efficient and ease of use of existing content distribution
infrastructure components such as CDNs, proxies, caches, NATs and
firewalls.
4) Support of integrated services with multiple components.
5) Support for signaling, delivery, utilization of multiple
content protection and rights management schemes, and support for
efficient content forwarding and relay.
3.3.2. Requirements for proposal
A list of requirements on HTTP streaming are ecouraged by MPEG. Only
those related to media delivery are introduced as follows.
1) This standard shall support streaming of content and content
components over HTTP 1.1.
2) The media files prepared for this standard should be
deliverable using progressive download with minimal changes.
3) This standard shall support streaming of live content of
possibly indefinite length, including PVR functionalities such as
pause and time-shifted play.
4) The standard shall support random access (seeking).
5) The standard shall support trick modes at least to the extent
that the underlying formats support them in local playback.
6) The standard shall not require any extension to HTTP 1.1. It
shall support the efficient use of HTTP optimized infrastructures
such as Content Delivery Networks (CDNs), caches and proxies.
7) The standard shall allow segmentation of the content. The
standard shall not require fixed size or fixed duration segments
during delivery of content.
Zong Expires April 27, 2011 [Page 13]
Internet-Draft Survey and Gap Analysis October 2010
8) The standard should introduce minimal transport overhead and
should incur minimal presentation startup delay.
9) The standard shall support description of media components for
delivery and presentation.
10) The standard shall support interactive selection of media
components for delivery and presentation, for example view
selection in multi-view content.
11) This standard shall support prioritization of content and
content components.
12) This standard shall support signaling the relationship among
content components.
13) The standard should support network transition during delivery
of the content.
14) The standard shall enable adaptation of content along axes
such as bitrate, temporal resolution, spatial resolution, quality/
fidelity or view perspective.
15) The standard shall support initial selection, and dynamic
adaptation of the content without presentation interruption during
delivery.
4. HTTP Streaming Implementations
4.1. Microsoft Smooth Streaming
Smooth Streaming is Microsoft implementation of adaptive streaming
technology, which is a web-base media content delivery that uses
standard HTTP [MS-IIS]. Instead of delivering media as full-file
download, or as progressive download, the content is delivered to
client as a series of small file chunks that can be easily and
cheaply cached at edge servers, closer to client. Smooth Streaming
defines each chunk/GOP as an MPEG-4 Movie Fragment and stores it
within a contiguous MP4 file for easy random access. One MP4 file is
expected for each bit rate. Because the media is "virtually" split
into fragment files, the server must translate sequential URL
requests into exact byte range offsets within the MP4 file. Server
extracts the fragment box and sends it over the wire to the client as
a standalone file.
Zong Expires April 27, 2011 [Page 14]
Internet-Draft Survey and Gap Analysis October 2010
4.1.1. On-disk MP4 file format
+-------------------------------------------------------------------+
| +----+ +---------------------+ +--------------+ +------+ +------+ |
| | | | Movie Metadata(moov)| |Movie Fragment| |Media | |Movie | |
| |file| |+-----++-----++-----+| | (moof) | |Data | |Frag | |
| |type| ||Movie||Track||Movie|| |+----+ +-----+| |(mdat)| |Random| |
| | | ||hdr || ||Ext. || ||Frag| |Track|| | | |Access| |
| | | || || || || ||hdr | |Frag || | | |(mfra)| |
| | | || || || || || | | || | | | | |
| | | |+-----++-----++-----+| |+----+ +-----+| | | | | |
| +----+ +---------------------+ +--------------+ +------+ +------+ |
+-------------------------------------------------------------------+
In a nutshell, the MP4 file starts with file-level metadata ('moov')
that generically describes the file, but the bulk of the payload is
actually contained in the fragment boxes that also carry more
accurate fragment-level metadata ('moof') and media data ('mdat').
Closing the file is an 'mfra' index box that allows easy and accurate
seeking within the file.
In Smooth Streaming, the MP4 files are classified into two kinds.
One is *.ismv file containing video and audio. Another is *.isma
containing audio only. Beside media files, there are manifest files.
Server manifest file (*.ism) describes the relationships between the
media tracks, bit rates and files on disk. Client manifest file
(*.ismc) describes the available streams to the client: the codecs
used, bit rates encoded, video resolutions, markers, captions, etc.
4.1.2. On-wire segments transmission
Initially, the client requests the *.ismc client manifest from the
server. Client then requests fragments in the form of a URL, e.g., h
ttp://video.foo.com/NBA.ism/QualityLevels(400000)/
Fragments(video=610275114). Server then looks up the quality level
(bit rate) in the corresponding *.ism server manifest and maps it to
a physical *.ismv or *.isma file on disk. Server reads the
appropriate MP4 file, and based on its 'tfra' index box, figures out
which fragment box ('moof' + 'mdat') corresponds to the requested
start time offset. Server extracts the fragment box and sends it
over the wire to the client as a standalone file. The sent fragment/
file can now be automatically cached further down the network,
potentially saving the origin server from sending the same fragment/
file again to another client that requests the same URL.
Zong Expires April 27, 2011 [Page 15]
Internet-Draft Survey and Gap Analysis October 2010
4.1.3. Adaptative support
Smooth Streaming provides multiple encoded bit rates of the same
media source and thus allow client to seamlessly switch between bit
rates. As client plays chunks, network condition may change or media
processing may be impacted by other applications. Client can
immediately request the next chunk come from stream that is encoded
at a different bit rate to accommodate changing conditions. This
enables client to play media without any stuttering, buffering and
freezing, thereby providing fittest-quality playback to client.
4.2. Adobe
Adobe HTTP Dynamic Streaming is a new Adobe-defined delivery method
for enabling on-demand and live adaptive bitrate video streaming over
regular HTTP connections [Adobe]. HTTP Dynamic Streaming packages
media files into fragments that Flash Player clients can access
instantly without downloading the entire file. Adobe HTTP Dynamic
Streaming contains several components that work together to package
media and stream it over HTTP to Flash Player.
4.2.1. Components
File Packagers include Live Packager and VoD Packager. VoD Packager
translates on-demand media files into fragments and writes the
fragments to F4F files. Live Packager translates ingested live
streams over Real Time Messaging Protocol (RTMP) into F4F files in
real-time.
HTTP Origin Module is an Apache HTTP Server module that serves the
F4F files created by the File Packagers.
The F4F file format describes how to divide media content into
segments and fragments. Each fragment has its own bootstrap
information that provides cache management and fast seeking. The F4M
Manifest file format contains information about a package of files
that the HTTP Origin Module can serve. Manifest information includes
codecs, resolutions, and the availability of files encoded at
multiple bit rates.
4.2.2. Workflow
HTTP Dynamic Streaming workflow includes content preparation which
write media fragments into files, distribution of files over HTTP,
media consumption and protection, etc.
Zong Expires April 27, 2011 [Page 16]
Internet-Draft Survey and Gap Analysis October 2010
+--------+ +-------+ +-------+ +------+
| | | | | | | |
Live | |F4F/F4M | | | | | |
streaming|File |Files | HTTP |HTTP | HTTP |HTTP |Client|
-------->|Packager|------->| Origin|Delivery| Cache/|Delivery|Appl. |
| | | Module|------->| CDN |------->| |
VoD | | | | | | | |
content | | | | | | | |
-------->| | | | | | | |
+--------+ +-------+ +-------+ +------+
4.2.3. Top features
HTTP Dynamic Streaming supports features like adaptive bitrate, DVR
functionality, etc.
1) Adaptive bitrate. To stream multi-bitrate content, the server
encodes a piece of media at multiple bitrates, creating multiple
files. The media files share a manifest file that lists
information about each media file. With this information, the
client detects the client's bandwidth, computer resources, etc and
requests content fragments encoded at the most appropriate bitrate
for the best viewing experience.
2) DVR functionality. Add interactivity to live streams by
enabling DVR functionality, allowing viewers to pause, rewind, and
skip forward to real time.
3) Support for standard HTTP caching systems. Leverage existing
standard server hardware and caching infrastructures to maximize
capacity and reach.
4.3. Apple
Apple HTTP Live Streaming [Apple] allows to send live or prerecorded
audio and video to iPhone or other devices, such as desktop
computers, using an ordinary Web server. Playback requires iPhone OS
3.0 or later on iPhone or iPod touch; QuickTime X or later is
required on the desktop.
Zong Expires April 27, 2011 [Page 17]
Internet-Draft Survey and Gap Analysis October 2010
4.3.1. Basic process
+-------+ +---------+ +------+ +------+
| | | | | | | |
Live | |MPEG2 | |Index/ | | | |
streaming|Media |TS |Stream |.ts Files|HTTP |HTTP |Client|
-------->|Encoder|----->|Segmenter|-------->|Server|Delivery|Appl. |
| | | | | |------->| |
VoD | | | | | | | |
content | | | | | | | |
-------->| | | | | | | |
+-------+ +---------+ +------+ +------+
Media Encoder takes audio-video input and turns it into an MPEG-2
Transport Stream. Currently, the supported format is MPEG-2
Transport Streams (with H.264 video and AAC audio) for audio-video,
or MPEG elementary streams for audio.
Stream segmenter reads the Transport Stream from the local network
and divides it into a series of small media files (.ts files) of
equal duration, and creates an index file containing a playlist of
the media files, as well as meta-data information. The index file is
in .M3U8 format. In the case of a live stream, each time the
segmenter completes a new media file, the index file is updated. The
index is used to track the availability and location of the media
files. Both .ts and .M3U8 files are placed on a HTTP server.
A HTTP server or a web caching system that delivers the media files
and index files to the client over HTTP.
A client begins by fetching the index file, based on a URL
identifying the stream. The index file in turn specifies the
location of the available media files, decryption keys, and any
alternate streams available. For the selected stream, the client
downloads each available media file in sequence. Each file contains
a consecutive segment of the stream. Once it has a sufficient amount
of data downloaded, the client begins presenting the reassembled
stream to the user.
In addition, HTTP Live Streaming technology supports adaptive bitrate
and automatically switches to the optimal bitrate based on the
network conditions for a smooth quality playback experience.
5. Gap Analysys
Zong Expires April 27, 2011 [Page 18]
Internet-Draft Survey and Gap Analysis October 2010
5.1. Brief Summary of Exitsting Work
It can be observed that 3GPP, OIPF, MS Smooth Streaming, Adobe
Dynamic Streaming and Apple HTTP Live Streaming all follow a similar
design scope, that is:
1) Streaming server utilizes a stream encoder/segmenter to write
the media content into a series of small files, as well as produce
a manifest file to describe these media files. See below summary
of existing defined media and menifest files for HTTP streaming,
regardless the codec and media container type.
| Media File | Menifest File |
=========================================================
3GPP/OIPF | .3GP file | .3GP file |
---------------------------------------------------------
MS Smooth HTTP | .ismv/.isma file | .ism/.ismc file |
---------------------------------------------------------
Adobe Dynamic HTTP | .F4F file | .F4M file |
---------------------------------------------------------
Apple Live HTTP | .ts file | .M3U8 file |
2) HTTP client firstly obtains the menifest file, then construct a
series of URIs pointing to the media files. Based on the
condition of client (e.g. network, device type, etc), or in the
situation when the user operates trick mode, the client choose to
request certain media file using HTTP request with the
corresponding URI.
3) Upon receiving the HTTP request, the HTTP server send the media
file corresponding to the URI in the request to the client.
Apparently, the above design leave the network transport out of
scope, that is, the media (both live streaming and VoD content) is
encrypted into files and further transmitted by standard HTTP as
payload. From the network transport point of view, there is no
difference between transmission of such media data and normal text
file. All the main features of media streaming, such as meta-
information of media, PVR funtion, seeking, trick mode, adaptation
between different viewing quality, etc, are implemented (or can be
implemented) by the negotiation between server and client by flexible
MPD, or menifest file. Another word, all the intelligence in current
HTTP streaming design resides on the server and client software,
rather than the network transport.
Zong Expires April 27, 2011 [Page 19]
Internet-Draft Survey and Gap Analysis October 2010
5.2. Challenges
However streaming long duration and high quality media over the best-
effort Internet to satisfy the real-time streaming requirements faces
several challenges when there are no network capabilities support for
HTTP Streaming.
The first challenge is that the current HTTP streaming is based on
pull mode where the HTTP client relies on the updated menifest file
from the server to pull the chunk one after another through issuing a
sequence of HTTP requests to the HTTP server. In the case of live
streaming, the server will need to update the manifest file
frequently once a new chunk of live media becomes available. Hence,
a potential problem is that there will be additional round trips
between the client and the server for manifest file update before the
client can request each new chunk, which could risk the real-time
feature of live streaming. HTTP server push model, on the other
hand, enables the server to actively and continuously push chunks to
the client once a new chunk is available on the server, without the
round trips between the client and the server for manifest file
update. In this sense, push model could be more efficient and a
better candidate for time-sensitive scenario.
The second challenge is the lack of QoE improvement and monitoring
mechanisms in current HTTP streaming systems. Compared to the
dedicated IPTV system, the HTTP streaming based on the best-effort
Internet may suffer more from network transition. For example, when
a user switches live channel, the current group of pictures (GoP) and
initialization information for decoders (a.k.a. Reference
Information (RI)) of the media content need to be acquired by the
client ASAP to start playback. Unfortunately, there is no mechanism
so far to improve the transmission of the important HTTP packets,
hence may introduce a long delay to start the playback in the
scenario of HTTP streaming. Additionally, some QoE metrics at
session level, such as startup delay are important to the HTTP
streaming system for monitoring or diagnostic purpose.
Unfortunately, there is no such quality monitoring mechanisms (e.g.
like RTCP report) in current HTTP streaming system. To provide a
high-quality service for the user, monitoring and analyzing the
system's overall performance is extremely important, since offering
the performance monitoring capability can help diagnose the potential
network impairment.
With these above challenges, the typical user experience in the
existing HTTP streaming schemes can be limited by delayed startups,
poor quality, buffering delays, etc. Especially, in the case of
"Multi-Screen" applications, the service provider intends to provide
a common user experience when the user enjoys the media content
Zong Expires April 27, 2011 [Page 20]
Internet-Draft Survey and Gap Analysis October 2010
across PCs, TVs, and smart-phones. Therefore, HTTP streaming over
the Internet without some optimization on network transport for QoE
improvement may lead difficulty for the service provider to comply
the service level agreements (SLAs) between service provider and
users.
5.3. Gap List and Potential Working Scope in IETF
The following table list the gaps in exisiting works on HTTP
streaming including 3GPP, MS, Adobe, and Apple.
| If satisfied by |
Characteristic | existing work |
==========================================================
Adaptation bir-rate | Yes | |
----------------------------------------------------------
Playback control | Yes | |
----------------------------------------------------------
Use existing cache, CDN | Yes | |
----------------------------------------------------------
client pull model | Yes | |
----------------------------------------------------------
server push model | | No |
----------------------------------------------------------
Reliable transmission in network | Yes | |
----------------------------------------------------------
Real-time support in network | | No |
----------------------------------------------------------
QoE improvement (e.g. startup) | | No |
----------------------------------------------------------
QoE monitoring | | No |
----------------------------------------------------------
Multicast support for scalability | | No |
As the leading SDO on making the Internet work better, IETF is a
suitable place to address the above mentioned gaps by studying and
enhancing the network to meet the real-time requirement of HTTP
streaming system. A potential working scope can be: 1) investigate
the usage of server push model in HTTP streaming to find the better
model for the more time-sensitive applications, such as live
streaming; 2) study some QoE monitoring and feedback mechanisms (e.g.
like RTCP report ) in HTTP streaming system, including monitoring
architecture, feedback message coding, QoE metrics for HTTP
streaming, etc; 3) define some mechanisms for QoE improvement for
HTTP streaming, such as reducing startup delay in playback when user
swithes live channel or starts VoD; 4) further improve the real-time
streaming performance from the aspect of network transport functions.
Please refer to [HTTPStreamingPS], for more details on the problem
Zong Expires April 27, 2011 [Page 21]
Internet-Draft Survey and Gap Analysis October 2010
statement and scope of work.
6. IANA Considerations
This document presently raises no IANA considerations.
7. Security Considerations
This document presently raises no security considerations.
8. Acknowledgements
The authors would like to thank many people who give valuable
comments on this draft.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
9.2. Informative References
[3GPP] 3GPP, "Transparent end-to-end Packet-switched Streaming
Service (PSS) - Protocols and codecs (Release 9)",
March 2010.
[OIPF] OIPF, "HTTP Adaptive Streaming (Release 2)",
September 2010.
[MPEG-1] ISO/IEC JTC1/SC29/WG11, "HTTP Streaming of MPEG Media
Context and Objectives (N11337)", April 2010.
[MPEG-2] ISO/IEC JTC1/SC29/WG11, "Call for Proposals on HTTP
Streaming of MPEG Media (N11338)", April 2010.
[MPEG-3] ISO/IEC JTC1/SC29/WG11, "Use Cases for HTTP Streaming of
MPEG Media (N11339)", April 2010.
[MPEG-4] ISO/IEC JTC1/SC29/WG11, "Requirements on HTTP Streaming of
MPEG Media (N11340)", April 2010.
[MS-IIS] Microsoft Corporation, "IIS Smooth Streaming Technical
Zong Expires April 27, 2011 [Page 22]
Internet-Draft Survey and Gap Analysis October 2010
Overview", March 2009.
[Adobe] Adobe, "Using ADOBE HTTP DYNAMIC STREAMING", 2010.
[Apple] Apple, "HTTP Live Streaming Overview", November 2009.
[HTTPStreamingPS]
Wu, Q., "Problem Statement for HTTP Streaming",
draft-wu-http-streaming-optimization-ps-02.txt (work in
progress), September 2010.
Author's Address
Ning Zong
Huawei Technologies
Phone: +86 25 56624760
Email: zongning@huawei.com
Zong Expires April 27, 2011 [Page 23]
| PAFTECH AB 2003-2026 | 2026-04-24 08:35:18 |