One document matched: draft-hunt-avt-monarch-00.xml


<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
     by Daniel M Kohn (private) -->

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
    <!ENTITY rfc2119 PUBLIC '' 
      'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
]>

<rfc category="info" ipr="full3978" docName="draft-hunt-avt-monarch-00.txt">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>

    <front>
        <title abbrev='RTP Monitoring Architectures'>Monitoring Architectures for RTP</title>
        <author initials='G.' surname='Hunt' fullname='Geoff Hunt'>
            <organization abbrev='BT'>BT</organization>
		<address>
	    <postal>
        	<street>Orion 1 PP9</street>
        	<street>Adastral Park</street>
        	<street>Martlesham Heath</street>
        	<city>Ipswich</city> <region>Suffolk</region>
        	<code>IP5 3RE</code>
        	<country>United Kingdom</country>
    	    </postal>
	    <phone>+44 1473 608325</phone>
	    <email>geoff.hunt@bt.com</email>
	    </address>
        </author>
        <author initials='P.J.' surname='Arden' fullname='Philip Arden'>
            <organization abbrev='BT'>BT</organization>
		<address>
	    <postal>
        	<street>Orion 3/7 PP4</street>
        	<street>Adastral Park</street>
        	<street>Martlesham Heath</street>
        	<city>Ipswich</city> <region>Suffolk</region>
        	<code>IP5 3RE</code>
        	<country>United Kingdom</country>
    	    </postal>
	    <phone>+44 1473 644192</phone>
	    <email>philip.arden@bt.com</email>
	    </address>
        </author>
        <date month='July' year='2008' />
       <area>Real-time Applications and Infrastructure Area</area>
        <workgroup>Audio/Video Transport Working Group</workgroup>
        <keyword>RFC</keyword>
        <keyword>Request for Comments</keyword>
        <keyword>I-D</keyword>
        <keyword>Internet-Draft</keyword>
        <keyword>Real Time Control Protocol</keyword>
        <abstract>
        <t>
This memo is intended to stimulate discussion on a hierarchical monitoring 
architecture for RTP, including a scheme for the definition of lower-layer 
metrics which are usable by a range of applications. Systematic investigation 
of a monitoring architecture for RTP/RTCP was requested at the IETF71 
(Philadelphia) AVT session.
</t><t>
This first version of the draft is restricted to transport metrics and to a subset of audio 
application metrics, but it is envisaged that future work should extend this to other applications, 
principally video.
        </t>
        </abstract>
    </front>

    <middle>

<section title='Requirements notation'>
<t>
This memo is informative and as such contains no normative requirements.
</t>
</section>

<section anchor='intro' title="Introduction">
<t>
The development of multiple metrics for transport and application quality 
monitoring has been identified as a potential problem for RTP/RTCP interoperability. 
The AVT group has requested work on an architectural framework for monitoring which 
recognises that different applications layered on RTP may have some monitoring 
requirements in common, which should be satisfied by a common design. When this work 
was initiated, the objective was to design a framework and a small number of re-usable 
metrics at each appropriate layer to reduce implementation costs and to maximise 
inter-operability. Since then, work-in-progress on <xref target='GUIDELINES'/> has stated 
that RTCP should 
be used primarily to provide information to peer RTP systems, whilst information used 
for network management should be carried by out-of-band protocols. By implication, 
AVT should not work on metrics or their transport in RTCP unless they are motivated 
by RTP-system-to-RTP-system requirements. However, metrics supporting network and service management 
are still required for RTP 
and the applications transported over it, to support many significant real-world deployments.
</t><t>
Service providers may wish to answer some or all of the following:
</t>
<t>
<list style='symbols'>
<t>
is a user experiencing a problem?
</t><t>
what is the nature of the problem?
</t><t>
how severe is the problem?
</t><t>
what is the location of the problem?
</t>
</list>
</t>
<t>
Metrics of transport performance and application performance, considered either on an isolated per-session basis or as a 
collection of metrics for multiple sessions using a common network component, can answer or contribute to answers
to some 
or all of these questions.
</t><t>
One example which might lead to a shared metric arises from a shared requirement for monitoring of packet transport, which might be 
useful for every media type (audio, video, text, messaging) carried over RTP. 
</t><t>
Another example is the set of applications all of which transmit audio, including streaming 
audio speech, streaming music, two-party conversational speech, and audio conferencing. 
This set of applications might be able to share a suitably defined set of audio metrics, 
e.g. for parameters such as noise floor, mean level, or amplitude clipping. The subset of 
interactive speech applications may be able to use common additional metrics related to 
interactivity (e.g. media delay and echo) which are not applicable to all audio applications.
Some or all of these audio metrics may be applicable to the audio channel(s) of a video 
application, such as IP TV or conversational video.
</t><t>
[Editor's note: need to add a video-based view and examples]
</t><t>
Metrics of RTP transport performance usually relate to single packet network segments, whilst metrics of application performance 
are more likely to represent the end-to-end connection which may include transmission over non-packet networks and/or over multiple 
packet networks. Access to, and integration of, multiple sets of packet transport metrics relevant to a single connection typically 
present difficulties in current networks.
</t><t>
Metrics are typically measured in an RTP systems but may be required at another RTP system or at a non-RTP system. Hence transport
of metrics is often required.
Metrics might be transported alongside RTP media using the extensibility mechanism 
defined in <xref target='RFC3611'/> but this is not an input requirement. Other methods may be used if 
RTCP XR blocks are not suitable or another method offers significant technical advantages. Following the
work-in-progress in <xref target='GUIDELINES'/> which restricts the usage of RTCP, the method for transporting 
metrics need not be RTCP and should be chosen 
independently of the metrics themselves. If the transport is not by RTCP, it is likely that multiple transport mechanisms should be 
permitted, and probably should not be restricted by AVT.
</t><t>
For transport metrics, IETF and other SDOs have defined metrics. There is a wide choice of 
potentially useful metrics. Some metrics may embed arbitrary design choices, or be 
application-specific. It is a goal of this work to find generic and re-usable metrics. 
This may result in a preference for some of the existing metrics over others, or to the 
definition of alternative metrics meeting the architectural goals of this work.
</t><t>
For metrics at layers higher than transport, metrics are developed by a variety of external 
SDOs, e.g. by ITU-T for voice telephony applications.
</t><t>
The development of application metrics is an active field. Any framework should be extensible 
to accommodate useful innovations when there is a consensus for their adoption.
</t><t>
It is obviously desirable to achieve some consensus (the more, the better) on a set of useful 
metrics (the fewer, the better) which may be widely implemented, widely inter-operated, and 
widely understood. Large data sets of raw measurements must be condensed into a smaller set 
of metrics or statistics before any agent (human or machine) can make decisions based on them. 
It has been suggested that AVT might remain "metric-neutral" by storing and transporting raw 
measurement data, rather than the condensed metrics (see Option 1 below). Even if data volumes are 
sufficiently small to make this feasible, some layer must perform the condensation and hence 
commit to specific metrics.
</t>
<t>
A four-step process is suggested. The AVT community may wish to contribute to some of these steps. 
</t>
<t>
<list style='numbers'>
<t>
Choose a set of metrics which is useful for each application. 
</t><t>
Classify each member of the sets of metrics according to the architectural layer which they monitor, creating 
sets of per-application, per-layer metrics. 
</t><t>
Define a set of required metrics at each layer as the union of the application-specific sets in each layer.
This should include the selection of only one from any group of metrics with overlapping or nearly-overlapping capabilities, 
leading to agreed sets of per-layer metrics. All of these metrics should be available within the architecture, but each 
application may select a subset which meets its needs. Most RTP end systems and RTP mixers implement only a subset of possible 
RTP applications, and clearly these devices need not implement any metric which is relevant only to applications which they 
do not support.
</t><t>
Choose one or more transport protocols for those cases where metrics are measured at one location 
but must be available at another, e.g. to cause a reaction in an RTP system's peer, or for network or service 
management purposes. 
</t>
</list>
</t><t>
The fourth question seems at first sight to be of secondary importance ("We've chosen our metrics, now 
all we have to do is to transport them") but the choice of transport protocols may be tightly constrained, 
for example because the measuring point has limited performance and/or limited access bandwidth and/or is 
in a different trust domain.
</t>
<t>
<xref target='transmetric'/> describes some options for metrics of transport performance. This includes an initial quantitative investigation of 
the feasibility of becoming "metric-neutral" by sending raw measurement data rather than condensed metrics.
</t>
<t>
<xref target='applayer'/> starts the process of describing requirements for application-layer monitoring and the metrics frameworks available 
to meet them. In this first version of the draft, the description is limited to interactive speech and takes most of its material 
from the work of ITU-T.
</t>
<t>
<xref target='transprot'/> discusses the choice of transport protocols, including discussion of the merits of RTCP which 
remains a candidate protocol.
</t>
</section>

<section anchor='transmetric' title='Transport layer metrics'>
<t>
The objective is to provide a set of metrics which characterise the three transport impairments of packet 
loss, packet delay, and packet delay variation. These metrics should be usable by any application which uses RTP transport.
</t>
<section title='Option 1 - Monitoring every packet'>
<t>
Most transport metrics, almost by definition, condense a large amount of information about packet arrivals into 
a small number of statistics. Usually, the aim of the statistics is to present key features of any transport 
impairments in ways which are readily understood by the operators of the network, with the minimum of distracting 
additional information. Unfortunately there are multiple ways to condense data about packet arrivals, and the 
"key features" (those impairments which result in degraded application performance) are likely to be 
application-dependent. Given this, it is not surprising that there are no known provably optimal metrics for the 
three transport impairments. There are instead multiple heuristic metrics.
</t><t>
The aim of "monitoring every packet" is to ensure that the information reported is not dependent on 
the application. In this scheme, RTP systems will report arrival data for each individual RTP packet. 
RTP (or other) systems receiving this "raw" data may use it to calculate any preferred heuristic metrics, but such calculations 
and the reporting of the results (e.g. to a session control layer or a management layer) are outside the scope of 
RTP and RTCP.
</t><t>
Run-length encoding (RLE) is a well-known technique for compressing per-packet information about packet loss. 
The efficiency of RLE compression is reduced as the packet loss fraction increases, leading to unpredictable 
metrics data.
</t><t>
If packet round-trip delay is measured using the technique described in <xref target='RFC3550'/> section 6.4.1 and <xref target='RFC3550'/> 
Figure 2, the rate of measurement is low (at most one measurement per RTCP measurement cycle) and the volume 
of data involved in reporting the result is insignificant.
</t><t>
There are no obvious techniques for substantial compression of data related to the arrival times of individual 
packets, but such data is needed to compute packet delay variation. Hence it appears that an item of data must 
be sent per packet, if packet delay variation is to be calculated from "raw" data. 
</t><t>
The following calculation estimates the volume of data needed to send per-packet data, assuming a simple 
logarithmic scheme to code the delay variation.
</t><t>
Consider the raw delay variation metric D(1,j) using the notation of <xref target='RFC3550'/> section 6.4.1. If delay variation, 
relative to that of the first packet of the connection, is measured in RTP timestamp units, delay could be coded 
on a compressed "logarithmic" scale similar to G.711 A-law, which can code with a resolution of 1 unit on the 
uncompressed chord, and resolutions 2, 4, 8, 16, 32, 64 on each successive more compressed chord to give a 
range of +/- 2048. This would correspond to +/- 2048/8000s ~ +/- 250ms for 8kHz sampled speech (enough to cover 
jitter), whilst using 1 byte per packet. Modifications would be needed for other sampling rates. It might be 
necessary to standardise a timing unit resolution independent of the sampling clock. Specific reserved values 
could be used to indicate that an expected packet did not arrive.
</t><t>
To estimate data volume, consider a low-bandwidth codec like G.729 with 20ms packetisation. Over a 5s RTCP cycle 
there will be 250 media packets and 102 bytes/packet (20ms G.729 in RTP/UDP/IP/Ethernet including preamble and 
Inter-Frame Gap) for a total media layer-2 bandwidth of 25500 bytes/5s (about 40kbit/s). 1 byte per received 
packet is 250 bytes "raw data" and an overhead of 82 bytes (RTP/UDP/IP/Ethernet, same basis) - say 350 bytes 
total including some identification (SSRCs etc). This is a fraction 350/25500~1.4% which is within RTP guidelines 
for RTCP bandwidth. The corresponding calculation for G.711 with 10ms packetisation is 81000 bytes/5s media and 
a 600-byte "raw transport report" or 0.75%.
</t><t>
However, the use of D(i,j) <xref target='RFC3550'/> for estimation of packet delay variation relies on a fixed relationship in 
the source RTP system between the RTP timestamp and the transmission time of the packet onto the wire. This fixed 
relationship is not guaranteed even for audio coding and is almost certainly significantly wrong for many video 
formats, where the RTP timestamp indicates the sampling instant of a frame which may be encoded into multiple 
packets sent at significantly different times throughout a frame interval. It could be argued that the current 
RTP framework provides no means for reliable estimation of packet delay variation in general, despite the 
usefulness of the D(i,j) metric for simple audio streams. This could lead to a conclusion that an RTP-based 
measure of packet delay variation is not re-usable across RTP applications other than simple VoIP codecs.
</t><t>
Logically, digital signal processors (DSPs) would be used to calculate metrics, including the per-packet data described above.  Current advice is that an additional overhead of 600 bytes per channel  is needed to store measurement results before periodic transmission, and as such, the per-channel-memory required to support this option will increase memory requirements on infrastructure devices.  As memory solutions in currently deployed infrastructure gateways are sized for optimum performance, cost and power, adding this measurement function would result in a reduction of channel density which of course ultimately impacts cost and power.  Including additional memory in future designs of course has the same cost and power impacts.
</t><t>
The principle that RTP systems should send per-packet reception report data, and correspondingly 
that the RTP (or other) system receiving this report data should calculate the metrics of its choice from this data, results 
in a requirement for computation both at the RTP system which sends the per-packet report and at the RTP (or other) system 
which receives the report. If DSPs are used to perform this computation in the system which receives the report, 
there is a further demand on the memory of the DSP devices involved. If general-purpose computing devices are used, 
then the 
cost of these devices may be significant. For example, for a 16000 channel trunk media gateway implementing the 
scheme above and using 10ms packetisation, the gateway must code or decode a total of 3200000 bytes of data per second.
</t><t>
Note that this general method of supplying raw data from the RTP system is the only one which gives the system which receives the data the 
flexibility to calculate any chosen transport metric for upward reporting. All other methods below either omit or 
condense data, such that the RTP (or other) system receiving the report is informed only about certain aspects of the transport 
performance which was measured at the remote RTP system. However the method does not report on the impairment to 
far-end application  that the impairment to outgoing transport caused. For example, it provides no information about 
far-end jitter buffer events or late packets deemed lost by the application. This is considered further in 
<xref target='termmetrics'/> below.
</t>
</section>
<section title='Option 2 - Real-time histogram methods'>
<t>
There are several potentially useful metrics which rely on the accumulation of a histogram in real time, so that a 
packet arrival results in a counter being incremented rather than in the creation of a new data item. These metrics 
may be gathered with a low and predictable storage requirement. Each counter corresponds to a single class interval 
or "bin" of the histogram. Examples of metrics which may be accumulated in this way include the observed distribution 
of packet delay variation, and the number of packets lost per unit time interval.
</t><t>
Different networks may have very different expected and achieved levels of performance, but it may be useful to fix 
the number of class intervals in the reported histogram to give a predictable volume of data. This can be achieved by 
starting with small class intervals ("bin widths") and automatically increasing the width (e.g. by factors of two) if 
outliers are seen beyond the current upper limit of the histogram. Data already accumulated may be assigned unambiguously 
to the new set of bins, given some simple conditions on the relationship between the old and new origins and bin widths.
</t><t>
A significant disadvantage of the histogram method is the loss of any information about time-domain correlations between 
the samples which build the histogram. For example, a histogram of packet delay variation provides no indication of whether 
successive samples of packet delay variation were uncorrelated, or alternatively that the packet delay variation showed a 
highly-correlated low-frequency wander.
</t>
</section>
<section title='Option 3 - Monitoring by exception'>
<t>
An entity which both monitors the packet stream, and has sufficient knowledge of the application to know when transport 
impairments may have degraded the application's performance, may choose to send exception reports containing details of 
the transport impairments to a receiving system.  The crossing of a transport impairment threshold, or some application-layer event, 
would trigger such reports. 
RTP end systems and mixers are likely to contain application implementations which may, in principle, identify this type of exception.
</t><t>
It is likely that RTP translators will not contain suitable implementations which could identify such exceptions.
</t><t>
On-path devices such as routers and switches are not likely to be aware of RTP at all. Even if they are aware of RTP, they 
are unlikely to be aware of the RTP-level performance required by specific applications, and hence they are unlikely to be 
able to identify the level of impairment at which exceptional transport conditions may start to affect application performance.
</t><t>
This type of monitoring typically requires the storage of recent data in a FIFO (e.g. a circular buffer) so that data relevant 
to the period just before and just after the exception may be reported. It is not usually helpful to report transport data only 
from the period following an exception event detected by an application. This imposes some storage requirement (though less 
than needed for Option 1). It also implies the existence of additional cross-layer primitives or APIs to trigger the transport 
layer to generate and send its exception report. Such a capability might be considered architecturally undesirable, in that 
it complicates one or more interfaces above the RTP layer.
</t>
</section>
<section title='Option 4 - Application-specific monitoring'>
<t>
This is a business-as-usual option which suggests that the current approach should not be changed, based on the idea that 
previous application-specific approaches such as that of <xref target='RFC3611'/> were valid. If a large category of RTP applications (such as 
VoIP) has a requirement for a unique set of transport metrics, arising from its different requirements of the transport, then 
it seems reasonable for each application category to define its preferred set of metrics to describe transport impairments. We 
expect that there will be few such categories, probably less than 10. 
</t><t>
It may be easier to achieve interworking for a well-defined set of application-specific metrics than it would be in the case that applications 
select a profile from a palette of many independent re-usable metrics.
</t>
</section>
</section>
<section anchor='termmetrics' title='RTP terminal metrics'>
<t>
By "RTP terminal metrics" we mean metrics relating to the way a terminal deals with transport impairments affecting the incident 
RTP stream. These may include de-jitter buffering, packet loss concealment, and the use of redundant streams (if any) for 
correction of error or loss.
</t><t>
An examples of such a metric is a count of packets arriving too late to be played out at current de-jitter buffer settings.
</t>
</section>
<section anchor='applayer' title='Application layer metrics'>
<section title='Requirements for speech quality monitoring metrics'>
<t>
RTP transport can be used for different application types such as IP (including public internet) and non-IP. It can also apply to different user 
group sizes running over networks ranging in size from a small closed user group through an enterprise system to national and international 
networks.  Engineering judgment is required to choose the most suitable set of speech quality monitoring metrics for the type of application and 
the size of the network the application is running on.  Some metrics are more suitable for monitoring service level agreements (SLAs), others may 
be required for regular routine monitoring, and still others may be required for fault diagnosis.  The resolution of the metrics may also be 
different for different types of monitoring.  These considerations make it difficult to propose a "one size fits all" set of metrics.  However 
some general points can be made and it is also useful to propose a minimum set of metrics.
</t><t>
Mean Opinion Score (MOS) speech quality metrics such as MOS-LQO for listening quality and MOS-CQO for conversation quality (see later section for 
further discussion of MOS metrics) are useful for measuring end-to-end speech quality.  However they typically require significant time and 
processing power to produce a result and some MOS-LQO test methods require test calls that consume bandwidth.  This rules out MOS metrics for 
frequent large-scale monitoring.  Also methods for measuring conversational MOS are not yet mature enough for VoIP monitoring applications, even 
although many vendors are using an E-model <xref target='G.107'/> approach in the absence of anything else.  This only leaves MOS-LQO as an overall composite speech 
quality metric, and, being a listening-only metric, it does not take account of interactive effects such as fixed delay and echo.  However, 
MOS-LQO is often used for SLAs and usually provides a better estimate of what a user actually experiences, than a single network or terminal metric or a 
group of such metrics.  However, a poor MOS score by itself gives little indication of the cause of a problem, and further metrics are required 
for diagnostic purposes.
</t><t>
A proposed minimum set of metrics with suggested resolutions is as follows:
</t>
    <texttable anchor='table_example'>
        <ttcol align='left'>Metric</ttcol>
        <ttcol align='left'>Resolution</ttcol>
        <ttcol align='left'>Range</ttcol>
        <c>MOS-LQO</c>
        <c>0.1 MOS</c>
        <c>1 to 5</c>
        <c>Received speech level</c>
        <c>0.1 dB</c>
        <c>-60 to +10</c>
        <c>Received noise level</c>
        <c>0.5 dB</c>
        <c>-130 to +10</c>
        <c>Echo return loss</c>
        <c>0.1 dB</c>
        <c>6 to 40</c>
        <c>Round trip delay</c>
        <c>1 ms</c>
        <c>1 ms to 65 s</c>
        <c>Packet delay variation or jitter</c>
        <c>1 ms</c>
        <c>1 ms to 65 s</c>
        <c>Packet loss</c>
        <c>1 packet</c>
        <c>0 to 2^24</c>
    </texttable>
<t>
[Editor's note: More detail required here in a future draft to add information about meaningful measurement 
durations and whether measurements should include mean and peak values etc.  Also require some discussion around "second level" metrics such as 
jitter buffer parameters for diagnosis of more complicated problems.]
</t><t>
Note that some voiceband data applications running over the same transport network as voice applications may require much lower values of packet loss 
and packet delay variation than would be required for voice applications alone.
</t><t>
A reporting system for these metrics should be capable of accommodating intermediate network and terminal parameters as well as end-to-end 
quality metrics for both monitoring and diagnostic purposes. 
</t><t>
This minimum set of metrics should allow a wide range of problems to be diagnosed particularly if metrics are available at intermediate points 
in the network as well as at the endpoints.  Echo return loss and delay can be used to establish whether echo is a problem (which would not 
affect the MOS-LQO score as this is a listening only measurement).  Poor MOS-LQO scores could be caused by several factors, but individual 
measures of packet loss, jitter and noise levels could be used to establish the presence or absence of these degradations.  Finally, the level 
of received speech gives an indication of whether the operating point is correct and whether possible distortion or poor signal-to-noise are 
causing problems.
</t><t>
The codec type will often be known and this can also be very useful for diagnostic purposes if information about typical MOS scores and 
susceptibility to packet loss is known for example.  Knowledge of network topology is also very useful and can give an indication of possible 
bandwidth bottlenecks for example.
</t>
</section>
<section title='The audio hierarchy'>
<t>
The audio hierarchy can be broadly split into listening (one-way) and conversation (two-way, or multi-way conferencing) applications.  
These categories can be further split as shown in <xref target='hierarchy'/>.  In addition, ITU-T has defined a number of bandwidth categories; narrowband 
(300 to 3400 Hz), wideband (50 to 7000 Hz), super wideband (50 to 14000 Hz) and full band (20 to 20,000 Hz).
</t>
<figure anchor='hierarchy' title='The audio hierarchy'> 
<artwork>
                     Audio
                       |
	       ----------------------
        Listening             Conversation
	      |                      |
    -------------              -----------
   |             |            |           |
Streaming  Non-streaming   Two-way  Conferencing
                                          |
                                   -------------
                                  |             |
                            Non-spatial       Spatial
</artwork>
</figure>

<t>
The following sections concentrate on one-way (listening only) and two-way (conversational) telephony applications, for which several 
composite speech quality metrics exist in ITU-T Recommendations.  Similar considerations could apply to other applications such as 
conferencing and this should be addressed in further drafts.  Suitable metrics for spatial conferencing are more difficult to derive at 
this stage since the technology is still relatively new.
</t>
</section>

<section title='Individual network transport and terminal parameters affecting speech quality'>
<t>
Parameters affecting both listening and conversation quality include:
</t>
<t>
<list style='symbols'>
<t>
Listening level
</t><t>
Noise (both electrical circuit noise and environmental noise)
</t><t>
Distortion (including amplitude clipping and codec distortion)
</t><t>
Syllable clipping
</t><t>
Comfort noise and voice activity detection
</t><t>
Packet delay variation and jitter buffer operation
</t><t>
Packet loss
</t>
</list>
</t>
<t>
Listening levels that are either too quiet or too loud can be unpleasant and make communication difficult.
</t><t>
High noise levels can make listening difficult and in a conversation, high background noise levels may cause a speaker to raise their 
voice level so that they can hear themselves above the noise.
</t><t>
Certain types of signal distortion such as amplitude clipping can be very unpleasant.
</t><t>
Syllable clipping occurs when the speech at the start or end of a syllable is missing and can cause words to be misunderstood.
</t><t>
Voice activity detection is used to sense periods of voice inactivity and then transmit them as silence periods to reduce bandwidth.  
Artificial noise (comfort noise) is then injected on the receiving side of a connection to mask the silence caused by the voice activity detection.  Without the comfort noise injection the listener might think that the connection had died.  However, the contrast between comfort noise and transmitted background noise may be unpleasant for the listener if the comfort noise has not been well matched to the background noise.
</t><t>
Packet delay variation caused by the underlying transport has to be "smoothed out" by using a jitter buffer to temporarily store received 
speech and then play it back at a uniform rate.  Jitter buffers that are too short or have been incorrectly implemented may cause packet loss, 
or "stuttering" of speech, and jitter buffers delays that are too long unduly add to the overall delay of a connection.  For speech or music 
applications (not data) adaptive jitter buffers that reduce delay as much as possible whilst minimising the risk of packet loss are preferable.  
However buffer length adaptations must be carefully managed to ensure they are inaudible.  This is usually achieved by ensuring that such adaptations 
occur during silence intervals.
</t><t>
Finally packet loss causes temporary loss of the signal that may become unintelligible as a result.  
</t><t>
In addition, a good conversational experience requires interactivity between parties which in turn requires low delay, low echo applications.  
So some additional parameters affecting conversation quality can be listed as follows:
</t>
<t>
<list style='symbols'>
<t>
Delay
</t><t>
Talker echo
</t><t>
Listener echo
</t><t>
Double-talk performance
</t><t>
Sidetone
</t>
</list>
</t>
<t>
Long delays affect interactivity and can cause one party to think that the other party is being "very slow" in answering.  In extreme cases, 
very long delays can be very confusing and can cause one party to talk over the other party.  The only way round this problem is for the 
conversation to become half duplex where each party takes it in turns to speak, and each makes it clear when they have finished speaking.
Echo can either be caused by electrical reflections at a 2-wire to 4-wire converter or by acoustic or mechanical transmission paths between 
microphone and earphone. The latter effect is known as terminal coupling loss.  Talker echo cause the speaker to hear an echo of his own voice 
and can be very confusing.  Listener echo is generally less common and occurs when the listener hears an echo of the speaker's voice.  Short 
delays cause the signal to sound hollow or slightly reverberant, whilst longer delays cause a distinct echo or echoes.
</t><t>
Echo cancellers are used to minimise echo, but can cause other problems if not carefully designed.  For example, periods of double-talk where 
both parties are speaking at the same time may cause the canceller to diverge and produce echo.
</t><t>
Sidetone is local feedback from the speaker's microphone to their earpiece, which lets them know that the connection is still "live".  Without 
this feedback, the connection would sound "dead", which would be confusing.  The level, frequency response and distortion of the sidetone can 
all affect the user's experience.
</t>
</section>
  
<section title='Composite objective speech quality metrics'>
<t>
In addition to the individual "network" or "terminal" metrics described in the previous section, there are several composite speech quality metrics for objectively measuring end-to-end overall speech quality, based on a 5-point scale defined as follows:
</t><t>
Where
</t>
<t>					
<list style='symbols'>
<t>
5 = Excellent
</t><t>
4 = Good
</t><t>
3 = Fair
</t><t>
2 = Poor
</t><t>
1 = Bad
</t>
</list>
</t>
<t>
A measurement using the scale just described results in a Mean Opinion Score (MOS), which represents the mean of several opinions obtained from a 
subjective test.  Mean opinion score terminology is defined in <xref target='P.800.1'/>.
</t><t>
The composite speech quality metrics are useful for commissioning and Service Level Agreements (SLAs), but (as previously discussed) further additional diagnostic 
information is required when these metrics fall below threshold values. 
</t><t>
Composite objective speech quality metrics can be divided into listening quality (MOS-LQO) and conversational quality (MOS-CQO). The ITU-T has 
produced several recommendations for measuring these composite speech quality metrics <xref target='P.561'/>, <xref target='P.562'/>, 
<xref target='P.563'/>, <xref target='P.564'/>, <xref target='P.862'/>, <xref target='P.862.1'/>, and <xref target='P.862.2'/>.  A hierarchy of the various 
ITU speech quality test methods is shown in Figure 2.
</t>
<figure anchor='testmethods' title='Hierarchy of ITU Speech quality test methods'> 
<artwork>

             Objective speech quality test methods
                              |
                   -----------------------
                  |                       |
              Listening             Conversation
                  |                       |
         -----------------                |
        |                 |               |
   Intrusive        Non-intrusive        INMD
  Double-ended      Single-ended     P.561,P.562
        |                 |               |
        |            -----------          |
      PESQ         P.563      P.564     P.CQO
P.862, P.862.1   Estimate    Estimate   under
    P.862.2        based     based on   development
  WB extension   on speech   IP n/work
        |         payload    parameters
        |
     P.OLQA
     Under
  Development

</artwork>
</figure>
<t>
Double-ended test methods (P.862/P.862.1/P.862.2) rely on a reference signal that is injected at one end of the network and then captured at the 
other end of the network.  The reference and degraded signal are compared and an auditory transform that models the human hearing system is then 
applied to produce the final MOS value.  In contrast, single-ended systems do not require a reference signal and rely solely on the speech 
payload (eg P.563) or on IP network parameters (eg P.564).  P.563 measures several individual characteristics of the received speech signal 
and then combines the results to form a MOS-LQO, which has been verified against subjectively scored degraded speech files.  P.564 uses several 
IP network parameters and permitted RTCP-XR data to again produce a MOS-LQO.  In general double-ended methods are more accurate because they have 
a reference signal against which to compare the degraded signal.
</t><t>
P.561 describes an In-service Non-intrusive Measurement Device (INMD) for making in-service measurements of several voice and network parameters, 
which can then be used to produce a conversational mean opinion score as described in P.562.  However the algorithm in P.562 was originally 
intended for TDM rather than IP applications and therefore can only be applied to situations where the impact of IP impairments is negligible. 
The term "In-service" means that the measurements are made during real customer calls.    
</t><t>
In addition to the recommendations already mentioned, there is also a planning tool called the E-Model described in another ITU-T recommendation 
<xref target='G.107'/>.  This was not designed for monitoring applications, but has unfortunately been mis-used for this purpose by several vendors.
</t><t>
Another objective measurement tool is described in an ITU-R Recommendation <xref target='BS.1387'/>.  Perceptual Evaluation of Audio Quality (PEAQ) has generally 
been optimised for the assessment of music signals rather than speech and is applicable to high-quality coded audio systems as used by 
broadcasters for example.
</t><t>
The listening quality methods already mentioned (P.862/P.862.1/P.862.2, P.563 and P.564) all produce MOS-LQO values as their primary outputs 
and either require speech as an input or individual network parameters in the case of P.564.  Each can be used at intermediate, or end-points 
of the network provided that appropriate interfaces are available.  Except in the case of P.564, these methods either require computational 
power at the measurement point, or the speech file has to be captured and sent to a server for processing. In the latter case, the size of the 
speech file is too large for transport by RTCP.  By contrast, a P.564 MOS-LQO calculation only relies on packet header information and permitted 
information from RTCP-XR ie relatively lightweight data.
</t><t>
P.561/P.562 is the only ITU conversational monitoring method (although P.CQO is under development) and it requires the following parameters to 
be measured:
</t>
<t>
<list style='symbols'>
<t>
Active speech level
</t><t>
Noise level (psophometrically weighted)
</t><t>
Speech activity factor
</t><t>
Speech echo path delay
</t>
</list>
</t>
<t>
And at least one of
</t>
<t>
<list style='symbols'>
<t>
Echo loss
</t><t>
Echo path loss
</t><t>
Speech echo path loss
</t>
</list>
</t>
<t>
Class D INMDs <xref target='P.561'/> for IP applications are required to implement the following functions:
</t>
<t>
<list style='symbols'>
<t>
De-jitter buffer
</t><t>
Voice decoder
</t><t>
Comfort noise generator
</t><t>
Error concealment process
</t>
</list>
</t>
<t>
and are required to measure packet delay variation and IP packet loss ratio.
</t><t>
P.562 uses these input parameters to calculate a MOS-CQO score.  However as already mentioned the algorithm is at present suitable only 
for situations where the impact of IP impairments is negligible.
</t>
</section>

</section>

<section anchor='transprot' title='Choosing transport protocols for metrics'>
<t>
Metrics related to RTP sessions are measured by RTP 
systems but may use any convenient transport mechanism "horizontally" to other RTP systems or "northbound" to session control or 
management systems, e.g. RTCP XR <xref target='RFC3611'/>, SNMP <xref target='RFC3410'/>, as SIP <xref target='RFC3261'/> headers or attachments, or TR-069 mechanisms 
<xref target='DSLF-TR-069'/>. 
</t>
<section anchor='RTCPplusminus' title='RTCP as a transport for metrics - advantages and disadvantages'>
<t>
RTCP XR remains at least as a candidate transport protocol for metrics, though note that <xref target='GUIDELINES'/> states explicitly that 
"The amount of information going into RTCP reports should primarily target the peer (and thus include information that can be 
meaningfully reacted upon).  Gathering and reporting statistics beyond this is not an RTCP task and should be addressed by 
out-of-band protocols". 
</t><t>
If RTCP is used, AVT need define only a generic means to transport arbitrary payloads. Such a means is already available in the form of RTCP XR 
block types <xref target='RFC3611'/>. If the data is self-describing, e.g. based on ASN.1 <xref target='X.680'/> or XML <xref target='XML'/>, 
or if usage is standardised in profiles, it would be possible to transmit many 
different collections of data whilst using only a small number of codepoints from the limited namespace of XR report block types. 
As a minimum, only one XR block type codepoint need be allocated per SDO, with delegation to the SDO to manage a namespace defined by a type 
field in the 
payload. The measurements of round-trip delay and packet loss could still use the established mechanisms from RFC 3550.
</t><t>
This approach is analogous to the definition of codec payload formats for RTP. A specification could define how metrics payloads are carried in 
RTCP, and how SDP (including offer/answer) is used to request an RTP system to send a metrics payload. The approach decouples the RTCP base 
protocol (transport format, routing, and transmission rate rules, and RTCP's base metrics) from less generic use cases.
</t><section anchor='RTCPAdvantages' title='Advantages of RTCP'>
<t>
RTCP uses the same transport as the RTP media path and hence if media may be transmitted, it is likely 
that RTCP may also be transmitted - although for connections not using <xref target='RTPRTCPMUX'/>, 
this is subject 
to possible difficulties with NAT and firewall devices which may sometimes not open a port for RTCP.
</t><t>
RTCP uses the same transport as the RTP media path so will normally experience the same transport 
performance as that experienced by the RTP media packets. Firstly this allows an RTCP-based mechanism 
to make a representative measurement of round-trip delay. Secondly, if QoS mechanisms such as expedited 
forwarding (EF) have been implemented in support of the RTP media traffic, the transport is likely to be 
low-delay and possibly also low-loss, compared with a best-efforts class.
</t><t>
Existing transport devices (for example, SBCs, BGWs, NAT) have often been implemented to allow RTCP to transit transparently 
on next higher UDP port. The devices are unlikely to pass another protocol for the transport of metrics without 
modification. This would make it harder to introduce any non-RTCP protocol for transport of metrics.
</t>
</section>
<section anchor='RTCPDisadvantages' title='Disadvantages of RTCP'>
<t>
RTCP is usually carried over an unreliable RTP/UDP/IP transport. Any monitoring scheme using RTCP as its 
transport must be designed to tolerate message loss and duplication.
</t><t>
Bandwidth for the transport of RTCP may be limited. <xref target='RFC3550'/> explicitly limits the bandwidth consumed 
by RTCP traffic to 5% of the bandwidth used by RTP media. Even without this limitation, the volume of 
traffic which is allowed access to EF queues may be policed, such that large fractions of RTCP traffic 
might result in high loss for both the RTCP traffic and for RTP media.
</t>
</section>
</section>
</section>
<section title="IANA Considerations">
<t>None.</t>
	</section>

        <section title="Security Considerations">
<t>
This document itself contains no normative text and hence should not give rise
to any new security considerations, to be confirmed.
</t><t>
[Editor's note - should this section consider security merits/demerits of proposals for alternative protocols to RTCP?]
</t>
	</section>

        <section title="Acknowledgments">
<t>
This document was originally motivated by ideas from Colin Perkins.  The authors would like to thank Graeme Gibbs at BT, and Debbie Greenstreet and her TI colleagues for their review comments.
</t>
        </section>
    </middle>

    <back>
        <references title='Informative References'>
		<reference anchor='RFC3261'>
			<front>
                                <title>SIP: Session Initiation Protocol</title> 
                                <author initials='J.' surname='Rosenberg'> 
                                        <organization>dynamicsoft</organization> 
				</author> 
                                <date month='June' year='2002'/> 
			</front> 
			<seriesInfo name='RFC' value='3261' /> 	
			<format type='TXT' /> 
            </reference>
		<reference anchor='RFC3410'>
			<front>
                                <title>Introduction and Applicability Statements for Internet Standard Management Framework</title> 
                                <author initials='J.' surname='Case'> 
                                        <organization>SNMP Research, Inc.</organization> 
				</author> 
                                <date month='December' year='2002'/> 
			</front> 
			<seriesInfo name='RFC' value='3410' /> 	
			<format type='TXT' /> 
            </reference>
                <reference anchor='RFC3550'>
			<front> 
				<title>RTP: A Transport Protocol for Real-Time Applications</title> 
				<author initials='H.' surname='Schulzrinne' fullname='Henning Schulzrinne'> 
					<organization>Columbia University</organization> 
				</author> 
				<date month='July' year='2003' /> 
			</front> 
			<seriesInfo name='RFC' value='3550' /> 	
			<format type='TXT' /> 
		</reference>
                <reference anchor='RFC3611'>
			<front> 
				<title>RTP Control Protocol Extended Reports (RTCP XR)</title> 
				<author initials='T. (Ed)' surname='Friedman' fullname='Timur Friedman'> 
					<organization> Paris 6 </organization> 
				</author> 
				<date month='November' year='2003' /> 
			</front> 
			<seriesInfo name='RFC' value='3611' /> 	
			<format type='TXT' /> 
		</reference>
		<reference anchor='RTPRTCPMUX'>
			<front>
                                <title>Multiplexing RTP Data and Control Packets on a Single Port</title> 
                                <author initials='C.' surname='Perkins' fullname='Colin Perkins'> 
                                        <organization>University of Glasgow</organization> 
				</author> 
                                <date month='August' year='2007' /> 
			</front> 
                        <seriesInfo name='ID' value='draft-ietf-avt-rtp-and-rtcp-mux-07' />  
			<format type='TXT' /> 
            </reference>
		<reference anchor='GUIDELINES'>
			<front>
                                <title>Guidelines for Extending the RTP Control Protocol (RTCP)</title> 
                                <author initials='J.' surname='Ott' fullname='Joerg Ott'> 
                                        <organization>Helsinki University of Technology</organization> 
				</author> 
                                <date month='June' year='2008' /> 
			</front> 
                        <seriesInfo name='ID' value='draft-ott-avt-rtcp-guidelines-01' />  
			<format type='TXT' /> 
            </reference>
		<reference anchor='P.800.1'>
			<front>
                                <title>Recommendation P.800.1, Mean Opinion Score (MOS) terminology</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='July' year='2006' /> 
			</front> 
            </reference>
		<reference anchor='P.561'>
			<front>
                                <title>Recommendation P.561, In-service non-intrusive measurement device - Voice service measurements</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='July' year='2002' /> 
			</front> 
            </reference>
		<reference anchor='P.562'>
			<front>
                                <title>Recommendation P.562.  Analysis and interpretation of INMD voice-service measurements</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='May' year='2004' /> 
			</front> 
            </reference>
		<reference anchor='P.563'>
			<front>
                                <title>Recommendation P.563.  Single-ended method for objective speech quality assessment in narrow-band telephony applications</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='May' year='2004'/> 
			</front> 
            </reference>
		<reference anchor='P.564'>
			<front>
                                <title>Recommendation P.564.  Conformance testing for narrowband voice over IP transmission quality assessment models</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='November' year='2007'/> 
			</front> 
            </reference>
		<reference anchor='P.862'>
			<front>
                                <title>Recommendation P.862.  Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='February' year='2001'/> 
			</front> 
            </reference>
		<reference anchor='P.862.1'>
			<front>
                                <title>Recommendation P.862.1.  Mapping function for transforming P.862 raw result scores to MOS-LQO</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='November' year='2003'/> 
			</front> 
            </reference>
		<reference anchor='P.862.2'>
			<front>
                                <title>Recommendation P.862.2.  Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and 
speech codecs </title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='November' year='2007'/> 
			</front> 
            </reference>
		<reference anchor='G.107'>
			<front>
                                <title>Recommendation G.107.  The E-model, a computational model for use in transmission planning.</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='March' year='2005'/> 
			</front> 
            </reference>
		<reference anchor='X.680'>
			<front>
                                <title>Recommendation X.680, Abstract Syntax Notation One (ASN.1): Specification of basic notation</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-T</organization> 
				</author> 
                                <date month='July' year='2002'/> 
			</front> 
            </reference>
		<reference anchor='BS.1387'>
			<front>
                                <title>Recommendation BS.1387.  Method for objective measurements of perceived audio quality</title> 
                                <author initials='' surname=''> 
                                        <organization>ITU-R</organization> 
				</author> 
                                <date month='November' year='2001'/> 
			</front> 
            </reference>
		<reference anchor='DSLF-TR-069'>
			<front>
                                <title>TR-069 CPE WAN Management Protocol v1.1</title> 
                                <author initials='' surname=''> 
                                        <organization>DSL Forum</organization> 
				</author> 
                                <date month='December' year='2007'/> 
			</front> 
            </reference>
		<reference anchor='XML'>
			<front>
                                <title>Extensible Markup Language (XML) 1.0 (Fourth Edition)</title> 
                                <author initials='' surname=''> 
                                        <organization>W3C</organization> 
				</author> 
                                <date month='September' year='2006'/> 
			</front> 
            </reference>
	</references>
    </back>

</rfc>

PAFTECH AB 2003-20262026-04-24 00:57:50