One document matched: draft-ietf-pwe3-congcons-00.xml
<?xml version="1.0"?>
<?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ?>
<?rfc compact="yes"?>
<?rfc subcompact="yes"?>
<?rfc toc="yes"?>
<rfc category='info' ipr='trust200902' docName='draft-ietf-pwe3-congcons-00'>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2914 SYSTEM 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2914.xml'>
<!ENTITY RFC3985 SYSTEM 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3985.xml'>
<!ENTITY RFC4023 SYSTEM 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4023.xml'>
<!ENTITY RFC4553 SYSTEM 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4553.xml'>
<!ENTITY RFC5086 SYSTEM 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5086.xml'>
<!ENTITY RFC5087 SYSTEM 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5087.xml'>
<!ENTITY RFC5348 SYSTEM 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5348.xml'>
]>
<front>
<title abbrev="PW-CONGESTION">PW Congestion Considerations</title>
<author initials="YJ" surname="Stein" fullname="Yaakov (Jonathan) Stein">
<organization>RAD Data Communications</organization>
<address>
<postal>
<street>24 Raoul Wallenberg St., Bldg C</street>
<city>Tel Aviv</city>
<code>69719</code>
<country>ISRAEL</country>
</postal>
<phone>+972 (0)3 645-5389</phone>
<email>yaakov_s@rad.com</email>
</address>
</author>
<author initials="D." surname="Black" fullname="David L. Black">
<organization>EMC Corporation</organization>
<address>
<postal>
<street>176 South St.</street>
<city>Hopkinton</city>
<code>69719</code>
<region>MA</region>
<country>USA</country>
</postal>
<phone>+1 (508) 293-7953</phone>
<email>david.black@emc.com</email>
</address>
</author>
<author fullname="Bob Briscoe" initials="B." surname="Briscoe">
<organization>BT</organization>
<address>
<postal>
<street>B54/77, Adastral Park</street>
<street>Martlesham Heath</street>
<city>Ipswich</city>
<code>IP5 3RE</code>
<country>UK</country>
</postal>
<phone>+44 1473 645196</phone>
<email>bob.briscoe@bt.com</email>
<uri>http://bobbriscoe.net/</uri>
</address>
</author>
<date day="14" month="October" year="2012" />
<area>Internet</area>
<workgroup>PWE3</workgroup>
<keyword>pseudowire</keyword>
<keyword>congestion</keyword>
<keyword>TCP friendliness</keyword>
<abstract>
<t>
Pseudowires (PWs) have become a common mechanism for tunneling traffic,
and may be found competing for network resources both with other PWs
and with non-PW traffic, such as TCP/IP flows.
It is thus worthwhile specifying under what conditions such competition is safe,
i.e., the PW traffic does not significantly harm other traffic
or contribute more than it should to congestion.
We conclude that PWs transporting responsive traffic behave as desired
without the need for additional mechanisms.
For inelastic PWs (such as TDM PWs) we derive a bound under which
such PWs consume no more network capacity than a TCP flow.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
A pseudowire (PW) is a construct for tunneling a native service over a Packet Switched Network (PSN)(see <xref target="RFC3985"/>),
such as IPv4, IPv6, or MPLS.
The PW packet encapsulates a unit of native service information by prepending
the headers required for transport in the particular PSN
(which must include a demultiplexer field to distinguish the different PWs)
and preferably the 4 byte PWE3 control word.
PWs have no bandwidth reservation mechanism, meaning that when multiple PWs are transported in parallel
there is no defined means for guaranteeing network resources for any particular PW.
This competition for resources may translate to a particular PW not being able to deliver the QoS required
to emulate the native service.
For example, MPLS-TE enables achieving a particular desired allocation of resources between multiple LSPs;
however, when multiple Ethernet PWs are placed in a single MPLS tunnel,
there is no way to similarly divide resources amongst them
(although DiffServ QoS prioritization may be available for PWs).
The use of PWs in service provider MPLS networks is well understood
and will not be discussed further here.
</t>
<t>
While PWs are most often placed in MPLS tunnels, there are several mechanisms that enable transporting PWs over an IP infrastructure.
These include:
<list>
<t> TDM PWs (<xref target="RFC4553"/><xref target="RFC5086"/><xref target="RFC5087"/>) that define UDP/IP encapsulations, </t>
<t> L2TPv3 PWs, </t>
<t> MPLS PWs directly over IP according to RFC 4023 <xref target="RFC4023"/>, </t>
<t> MPLS PWs over GRE over IP according to RFC 4023 <xref target="RFC4023"/>. </t>
</list>
Whenever PWs are transported over IP,
they may compete with congestion-responsive flows (e.g., TCP flows).
Hence in order to prevent congestion collapse the PWs MUST behave in a fashion
that does not cause undue damage to the throughput of such congestion-responsive flows <xref target="RFC2914"/>.
</t>
<t>
At first glance one may think that this would require a PW transported over IP to be considered as a single flow,
on a par with a single TCP flow.
Were we to accept this tenet,
we would require a PW to back off under congestion to consume no more bandwidth than a single TCP flow under such conditions
(see <xref target="RFC5348"/>).
However, since PWs may carry traffic from many users,
it makes more sense to consider each PW to be equivalent to multiple TCP flows.
We will discuss whether PWs consisting of elastic flows need a back-off strategy in <xref target="elastic"/>.
</t>
<t>
TDM PWs (<xref target="RFC4553"/><xref target="RFC5086"/><xref target="RFC5087"/>)
represent inelastic constant bit-rate (CBR) flows
that may require lower or higher throughput than
that consumed by an otherwise-unconstrained TCP flow would under the same network conditions.
In any case a TDM PW is not able to respond to congestion in a TCP-like manner;
on the other hand, the total bandwidth they consume remains constant
and does not increase to consume additional bandwidth as TCP rates back off.
If the bandwidth consumed by a TDM PW is considered detrimental,
the only available remedy is to completely shut down the PW.
Such a shutdown would impact multiple users,
and the service restoration time would in general be lengthy.
We will discuss when the shutdown of inelastic PWs can be avoided in <xref target="inelastic"/>.
</t>
</section>
<section anchor="elastic" title="PWs Comprising Elastic Flows">
<t>
In this section we consider Ethernet PWs that primarily carry congestion-responsive traffic.
We will show that we automatically obtain the desired congestion avoidance behavior,
and that additional mechanisms are not needed.
</t>
<t>
Let us assume that an Ethernet PW aggregating several TCP flows
is flowing alongside several TCP/IP flows.
Each Ethernet PW packet carries a single Ethernet frame that carries a single IP packet
that carries a single TCP segment.
Thus, if congestion is signaled by an intermediate router dropping a packet,
a single end-user TCP/IP packet is dropped, whether or not that packet is encapsulated in the PW.
</t>
<t>
The result is that the individual TCP flows inside the PW
experience the same drop probability as the non-PW TCP flows.
Thus the behavior of a TCP sender (retransmitting the packet and appropriately reducing its sending rate)
is the same for flows directly over IP and for flows inside the PW.
In other words, individual TCP flows are neither rewarded nor penalized
for being carried over the PW.
On the other hand, the PW does not behave as a single TCP flow;
it will consume the aggregated bandwidth of its component flows,
and backs off much less sharply than a single flow would.
</t>
<t>
We claim that this is precisely the desired behavior.
Any fairness considerations should be applied to the individual TCP flows,
and not to the aggregate.
Were individual TCP flows rewarded for being carried over a PW,
this would create an incentive to create PWs for no operational reason.
Were individual flows penalized, there would be a deterrence that could impede pseudowire deployment.
</t>
<t>
There have been proposals to add additional TCP-friendly mechanisms to PWs,
for example by carrying PWs over DCCP.
In light of the above arguments, it is clear that this would force the PW to behave as a single flow,
rather than N flows, and penalize the constituent TCP flows.
In addition, the individual TCP flows would still back off due to their end points
being oblivious to the fact that they are carried over a PW.
This will further degrade the flow's throughput as compared to a non-PW-encapsulated flow.
Thus, such additional mechanisms contradict the behavior previously described as desirable.
</t>
</section>
<section anchor="inelastic" title="PWs Comprising Inelastic Flows">
<t>
TDM PWs (<xref target="RFC4553"/><xref target="RFC5086"/><xref target="RFC5087"/>)
are more problematic than the elastic PWs of the previous section.
Being constant bit-rate (CBR), they can not be made responsive to congestion.
On the other hand, being CBR, they also do not attempt to capture additional bandwidth
when TCP flows back off.
</t>
<t>
Since a TDM PW continuously consumes a constant amount of bandwidth,
if the bandwidth occupied by a TDM PW endangers the network as a whole,
the only recourse is to shut it down, denying service to all customers of the TDM native service.
We should mention in passing that under certain conditions
it may be possible to reduce the bandwidth consumption of a TDM PW.
A prevalent case is that of a TDM native service that carries voice channels that may not all be active.
Using the AAL2 mode of <xref target="RFC5087"/> (perhaps along with connection admission control)
can enable bandwidth adaptation, at the expense of more sophisticated native service processing (NSP).
</t>
<t>
In the following we will show that for many cases of interest a TDM PW,
treated as a single flow, will behave in a reasonable manner without any additional mechanisms.
We will focus on structure-agnostic TDM PWs <xref target="RFC4553"/>
although our analysis can be readily applied to structure-aware PWs
(see Appendix A).
</t>
<t>
There are two network parameters relevant to our discussion,
namely the one-way delay D and the loss probability p.
The one-way delay of a native TDM service consists of the physical time-of-flight
plus 125 microseconds for each TDM switch traversed.
This is very small as compared to PSN network-crossing latencies.
Many protocols and applications running over TDM circuits thus require low delay,
and we need thus only consider delays of up to about 32 milliseconds.
</t>
<t>
The TDM PW RFCs specify the egress behavior upon experiencing packet loss.
Structure-agnostic transport has no alternative to outputting an "all-ones" AIS pattern towards the TDM circuit,
which if long enough in duration is recognized by the receiving TDM device as a fault indication (see Appendix A).
International standards place stringent limits on the number of such faults tolerated.
Calculations presented in the appendix show that only loss probabilities in the realm of fractions of a percent
are relevant for structure-agnostic transport (see Appendix A).
</t>
<t>
Structure-aware transport regenerates frame alignment signals
thus hiding AIS indications resulting from infrequent packet loss.
Furthermore, for TDM circuits carrying voice channels the use of packet loss concealment
algorithms is possible (such algorithms have been previously described for TDM PWs).
However, even structure-aware transport ceases to provide a useful service
at about 2 percent loss probability.
</t>
<t>
RFC 5348 on TCP Friendly Rate Control (TFRC) <xref target="RFC5348"/>
provides the following simplified formula for throughput that is used as the
basis for TFRC's sending rate control.
<artwork><![CDATA[
S
X_Bps = ------------------------------------------------
R ( sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2) )
]]></artwork>
where
<list>
<t> X_Bps is average sending rate in Bytes per second, </t>
<t> S is the segment (packet payload) size in Bytes, </t>
<t> R is the round-trip time in seconds, </t>
<t> p is the loss probability. </t>
</list>
</t>
<t>
We can use this formula to determine when a TDM PW consumes no
more bandwidth than a TCP flow between the same endpoints would
consume under the same conditions. Replacing the round-trip delay with
twice the one-way delay D, setting the bandwidth to that of the
TDM service BW, and the segment size to be the TDM fragment TDM plus
4 Bytes to account for the PWE3 control word, we obtain the following
condition for a TDM PW.
<artwork><![CDATA[
(TDM + 4)
D < ---------------
BW f(p) / 4
]]></artwork>
where
<list>
<t> D is the one-way delay, </t>
<t> TDM is the TDM segment size in Bytes, </t>
<t> BW is TDM service bandwidth in bits per second, </t>
<t> f(p) = sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2). </t>
</list>
</t>
<t>
One may view this condition as defining a safe operating envelope for a
TDM PW, as a TDM PW that consumes no more bandwidth than a TCP flow
would not affect congestion more than were it to be TCP traffic.
Under this condition it should hence be safe to mix the TDM PW
with congestion-responsive traffic such as TCP,
without causing significant additional congestion problems.
Were the TDM PW to consume significantly more bandwidth a TCP flow,
it could contribute disproportionately to congestion,
and its mixture with congestion-responsive traffic may be inappropriate.
</t>
<t>
We derived the condition assuming steady-state conditions,
and thus two caveats are in order.
First, the condition does not specify how to treat a TDM PW that initially
satisfies the condition, but is then faced with a deteriorating network environment.
In such cases one additionally needs to analyze the reaction times
of the responsive flows to congestion events.
Second, the derivation assumed that the TDM PW was competing with long-lived TDM flows,
because under this assumption it was straightforward to obtain a quantitative comparison
with something widely considered to offer a safe response to congestion.
Short-lived TCP flows may find themselves disadvantaged as compared
to a long-lived TDM PW satisfying the condition.
These dynamic cases will be considered in future versions of this draft.
</t>
<t>
The results are displayed in the accompanying figures
(available only in the PDF version of this document).
TCP compatible behavior is obtained for the area under curves
appropriate for each TDM fragment size.
</t>
<figure><artwork src="E1friendly.png">
<![CDATA[
--------------------------------------------------------------------
I I
I I
I I
I I
I E1 compatibility regions I
I I
I I
I I
I I
I (only in PDF version) I
I I
I I
I I
I I
I I
--------------------------------------------------------------------
]]>
</artwork>
<postamble> Figure 1 TCP Compatibility areas for E1 SAToP </postamble>
</figure>
<figure><artwork src="E3friendly.png">
<![CDATA[
--------------------------------------------------------------------
I I
I I
I I
I I
I E3 compatibility regions I
I I
I I
I I
I I
I (only in PDF version) I
I I
I I
I I
I I
I I
--------------------------------------------------------------------
]]>
</artwork>
<postamble> Figure 2 TCP Compatibility areas for E3 SAToP </postamble>
</figure>
<vspace blankLines='99' />
<t>
We see in Figure 1 that a TDM PW carrying an E1 native service (2.048 Mbps)
satisfies the condition for all parameters of interest
if each packet carries at least S=512 Bytes of TDM data.
For the SAToP default of 256 Bytes,
as long as the one-way delay is less than 10 milliseconds,
the loss probability can exceed 0.3 percent.
For packets containing 128 or 64 Bytes
the constraints are more troublesome, but there are still parameter ranges
where the TDM PW consumes less than a TCP flow under similar conditions.
Similarly, Figure 2 demonstrates that an E3 native service (34.368 Mbps) with the SAToP default of
1024 Bytes of TDM per packet satisfies the condition for delays up to about 5 milliseconds.
</t>
<t>
Note that violating the condition for a short amount of time
is not sufficient justification for shutting down the TDM PW.
While TCP flows react within a round trip time,
PW commissioning and decommissioning are time consuming processes
that should only be undertaken when it becomes clear that the
congestion is not transient.
Future versions of this draft will provide guidance as to when a TDM PW
should be terminated.
</t>
</section>
<section title="Security Considerations" >
<t>
This document does not introduce any new congestion-specific mechanisms
and thus does not introduce any new security considerations
above those present for PWs in general.
</t>
</section>
<section title="IANA Considerations" >
<t>
This document requires no IANA actions.
</t>
</section>
</middle>
<back>
<vspace blankLines='99' />
<references title="Informative References">
&RFC2914;
&RFC3985;
&RFC4023;
&RFC4553;
&RFC5086;
&RFC5087;
&RFC5348;
<reference anchor="G775">
<front>
<title>Loss of Signal (LOS), Alarm Indication Signal (AIS) and Remote Defect Indication (RDI) defect
detection and clearance criteria for PDH signals</title>
<author>
<organization>International Telecommunications Union</organization>
</author>
<date month="October" year="1998" />
</front>
<seriesInfo name="ITU" value="Recommendation G.775" />
</reference>
<reference anchor="G826">
<front>
<title>Error Performance Parameters and Objectives for International Constant Bit Rate Digital Paths at or above Primary Rate</title>
<author>
<organization>International Telecommunications Union</organization>
</author>
<date month="December" year="2002" />
</front>
<seriesInfo name="ITU" value="Recommendation G.826" />
</reference>
</references>
<vspace blankLines='99' />
<section title="Loss Probabilities for TDM PWs">
<t>
ITU-T Recommendation G.826 <xref target="G826"/> specifies limits
on the Errored Second Ratio (ESR) and the Severely Errored Second Ratio (SESR).
For our purposes, we will simplify the definitions and understand
an Errored Second (ES) to be a second of time during which a TDM bit error occurred
or a defect indication was detected.
A Severely Errored Second (SES) is an ES second during which the Bit Error Rate (BER) exceeded
one in one thousand (10^-3).
Note that if the error condition AIS was detected according to the criteria of ITU-T Recommendation G.775 <xref target="G826"/>
a SES was considered to have occurred.
The respective ratios are the fraction of ES or SES to the total number of seconds in the measurement interval.
</t>
<t>
For both E1 and T1 TDM circuits, G.826 allows ESR of 4% (0.04), and SESR of 1/5% (0.002).
For E3 and T3 the ESR must be no more than 7.5% (0.075), while the SESR is unchanged.
</t>
<t>
Focusing on E1 circuits, the ESR of 4% translates,
assuming the worst case of isolated exactly periodic packet loss,
to a packet loss event no more than every 25 seconds.
However, once a packet is lost, another packet lost in the same second doesn't change the ESR,
although it may contribute to the ES becoming a SES.
Assuming an integer number of TDM frames per PW packet, the number of packets per second is given by
packets per second = 8000 / (frames per packet),
where prevalent cases are 1, 2, 4 and 8 frames per packet.
Since for these cases there will be 8000, 4000, 2000, and 1000 packets per second,
respectively, the maximum allowed packet loss probability is
0.0005%, 0.001%, 0.002%, and 0.004% respectively.
</t>
<t>
These extremely low allowed packet loss probabilities are only for the worst case scenario.
In reality, when packet loss is above 0.001%, it is likely that loss bursts will occur.
If the lost packets are sufficiently close together (we ignore the precise details here)
then the permitted packet loss rate increases by the appropriate factor, without G.826 being cognizant of any change.
Hence the worst-case analysis is expected to be extremely pessimistic for real networks.
Next we will go to the opposite extreme and assume that all packet loss events are in periodic loss bursts.
In order to minimize the ESR we will assume that the burst lasts no more than one second,
and so we can afford to lose no more than packet per second packets in each burst.
As long as such one-second bursts do not exceed four percent of the time, we still maintain the allowable ESR.
Hence the maximum permissible packet loss rate is 4%.
Of course, this estimate is extremely optimistic,
and furthermore does not take into consideration the SESR criteria.
</t>
<t>
As previously explained, a SES is declared whenever AIS is detected.
There is a major difference between structure-aware and structure-agnostic transport in this regards.
When a packet is lost SAToP outputs an "all-ones" pattern to the TDM circuit,
which is interpreted as AIS according to G.775 <xref target="G775"/>.
For E1 circuits, G.775 specifies for AIS to be detected when four consecutive TDM frames have no more than 2 alternations.
This means that if a PW packet or consecutive packets containing at least four frames are lost,
and four or more frames of "all-ones" output to the TDM circuit, a SES will be declared.
Thus burst packet loss, or packets containing a large number of TDM frames,
lead SAToP to cause high SESR, which is 20 times more restricted than ESR.
On the other hand, since structure-aware transport regenerates the correct frame alignment pattern,
even when the corresponding packet has been lost, packet loss will not cause declaration of SES.
This is the main reason that SAToP is much more vulnerable to packet loss than the structure-aware methods.
</t>
<t>
For realistic networks, the maximum allowed packet loss for SAToP will be intermediate between
the extremely pessimistic estimates and the extremely optimistic ones.
In order to numerically gauge the situation, we have modeled the network as a four-state Markov model,
(corresponding to a successfully received packet, a packet received within a loss burst,
a packet lost within a burst, and a packet lost when not within a burst).
This model is an extension of the widely used Gilbert model.
We set the transition probabilities in order to roughly correspond to anecdotal evidence,
namely low background isolated packet loss, and infrequent bursts wherein most packets are lost.
Such simulation shows that up to 0.5% average packet loss may occur
and the recovered TDM still conform to the G.826 ESR and SESR criteria.
</t>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 01:10:00 |