One document matched: draft-white-bounded-longest-match-02.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc compact="yes" ?>
<rfc category="exp" ipr="full3978">
  <front>
    <title>Bounding Longer Routes to Remove TE</title>

    <author fullname="Russ White" initials="R." surname="White">
      <organization>Cisco Systems</organization>

      <postal />

      <email>riw@cisco.com</email>
    </author>

    <author fullname="Susan Hares" initials="S." surname="Hares">
      <organization>NextHop Technologies</organization>

      <address>
        <postal>
          <street>825 Victors Way</street>

          <city>Ann Arbor</city>

          <region>MI</region>

          <code>48108 </code>

          <country></country>
        </postal>

        <phone>734-222-1610</phone>

        <facsimile></facsimile>

        <email>skh@nexthop.com</email>

        <uri></uri>
      </address>
    </author>

    <author fullname="Ted Hardie" initials="T." surname="Hardie">
      <organization />

      <email>Ted.Hardie@nominum.com</email>
    </author>

    <date month="July" year="2008" />

    <area>Routing</area>

    <workgroup>Network Working Group</workgroup>

    <abstract>
      <t>Some ASes currently use length-based filters to manage the size of
      the routing table they use and propagate. This draft explores an
      alternative to length-based filters which allows for more automatic
      configuration and which provides for better redundancy.</t>

      <t>Rather than use a filter, this draft proposes a method of modifying
      the BGP [RFC1771] longest match algorithm by setting a bound on the
      prefix lengths eligible for preference. A bound would operate on long
      prefixes when covering route announcements are available; in certain
      circumstances it would cause a router to prefer an aggregate over a more
      specific route announcement.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>Many routes injected into the global default free zone of the
      Internet today are injected to steer traffic (or provide traffic
      engineering), rather than to provide reachability information directly.
      In several recent discussions, it has been asserted that this table
      growth due to routes injected to provide traffic engineering is causing
      many problems within the default free zone, including more table
      instability, as these routes appear to change state more often than
      shorter prefix aggregate routes.</t>

      <t>While filtering all routes at some predetermined length is an
      attractive option, it can be difficult to maintain and manage large
      filter sets built around a constantly changing database. It appears a
      more fruitful approach would be to detect routes injected for traffic
      engineering purposes, and remove them from the routing system
      automatically once they are beyond the point in the network where they
      are useful. This draft proposes a mechanism to perform just this task.
      When two routes with overlapping prefixes are detected, they are marked,
      and removed from the routing system, at a point where they are no longer
      needed. This mechanism does not suffer from any problems from route
      withdraws or failures, since routing will naturally take care of any
      connectivity changes. Various estimates have stated that removing the
      longer prefix routes within the routing table could reduce the table
      size by 25%.</t>

      <t>No actual changes to the operation of the BGP protocol at the packet
      or peering levels are required to implement this draft. A new well known
      non-transitive community is proposed.</t>
    </section>

    <!-- end of introduction -->

    <section title="Proposed Enhancements">
      <t>Two enhancements are proposed by this draft: three new communities,
      and a new way of handling overlapping prefixes received from an external
      peer.</t>

      <t>As each prefix is received by a BGP speaker from an external peer, it
      would be evaluated in the light of other prefies already received. If
      two prefixes overlap in space (such as 192.168.0.0/16 and
      192.168.1.0/24), the longer prefix would be marked with a new BOUNDED
      community, and the local preference set to a very high number so that it
      would always win in any best path computations within the autonomous
      system. The longer prefix may also be marked with a new community,
      NO_INSTALL.</t>

      <t>Routes marked with the new BOUNDED community MAY be filtered at the
      autonomous system edge to reduce the number of routes advertised by an
      AS.</t>

      <section title="Example of Bounding the Longer Prefix">
        <t>Assume the following configuration of autonomous systems:</t>

        <figure>
          <artwork><![CDATA[                 (   )
        /-------( AS2 )--------\
 (   ) /         (   )          \ (   )       (   )
( AS1 )                          ( AS4 )-----( AS5 )
 (   ) \         (   )          / (   )       (   )
        \-------( AS3 )--------/
                 (   )]]></artwork>
        </figure>

        <t><list style="symbols">
            <t>AS1 is advertising 192.168.1.0/24 to both AS2 and AS3.</t>

            <t>AS2 is advertising both 192.168.1.0/24 and 192.168.0.0/16 into
            AS4.</t>

            <t>AS3 is advertising 192.168.1.0/24 into AS4</t>

            <t>Each connection (session) is handled by a seperate router
            within each AS (for instance, AS4 peers with AS2 and AS3 on a
            seperate routers).</t>
          </list></t>

        <t>When the peering router in AS4 between AS4 and AS2 receives both
        the 192.168.1.0/24 and the 192.168.0.0/16 prefixes, it will mark the
        192.168.1.0/24 as BOUNDED, and set the local preference high, based on
        its router ID, as described in the section Setting the Local
        Preference, below, and will then propogate this through AS4.</t>

        <t>The border router between AS4 and AS3 will receive the longer
        prefix from AS3, and the iBGP prefix with the high local preference
        with BOUNDED set. Given it does not see the overlapping prefix, it
        will compare the default (lower) local preference of the externally
        learned route with the higher local preference set by the AS2/AS4
        border router, and will not advertise the 192.168.1.0/24 prefix into
        AS4 at all.</t>

        <t>The AS3/AS4 border router may also, on detecting the overlap, mark
        the longer prefix with a new community, NO_INSTALL, which is
        non-transitive and optional. Router which understand this community
        may choose not to install this prefix into the local RIB, in order to
        reduce memory consumption.</t>

        <t>If the link between AS1 and AS2 fails, the longer length prefix
        will be withdrawn from AS2, and thus the peering point between AS2 and
        AS4 will no longer have an overlapping set of prefixes. Within AS4,
        the border router which peers with AS2 will cease advertising the
        192.168.1.0/24 prefix, which allows the AS3/AS4 border router to being
        advertising it into AS4, and through AS4 into AS5, restoring
        connectivity to AS1.</t>
      </section>

      <!-- end of example of bounding longest match -->

      <section title="Setting the Local Preference">
        <t>Since there could be multiple points at which an autonomous system
        may receive the same pair of overlapping prefixes, there must be some
        way to ensure that one of the longer prefixes wins in the [BGP]
        decision algorithm consistently. In practice, this means that each BGP
        speaker which receives an overlapping set of routes should set the
        local preference on the set of longer prefixes so there won't be two
        longer prefixes with matching local preferences.</t>

        <t>The easiest way to ensure this within an autonomous system is to
        set the local preference for longer prefixes based on some unique
        number assigned to each BGP speaker. Given the router ID and the local
        preference are both 32 bit numbers, an ideal solution appears to be to
        simply set the local preference to the router ID of the BGP speaker.
        The primary problem with this is that in some cases, the router ID of
        the device may be lower than some standard Local Preference, perhaps
        even lower than a standard Local Prference used by default throughout
        a network.</t>

        <t>To alleviate this problem, the local preference of longer prefixes
        which overlap with shorter prefixes should be set to the router ID of
        the BGP speaker, and then the high order bit of the Local Preference
        should be set, so the setting will be gauranteed to be at least above
        64,000.</t>
      </section>

      <!-- end of setting local preference -->

      <section title="The NO_INSTALL Community">
        <t>An optional optimization to bounding longer prefixes by marking
        them with a high Local Preference and the BOUNDED community is to also
        mark them with a new, non-trasitive, optional community, NO_INSTALL.
        The effect of this community would be for any BGP speaker receiving a
        prefix with this community set to treat the prefix normally in the BGP
        bestpath computation, and to forward bestpaths marked as NO_INSTALL to
        iBGP peers, but to simply fail to install such prefixes in the local
        routing table.</t>

        <t>This would result in a some small amount of information stored and
        maintained in the local routing table, and the local forwarding tables
        built from the local routing table. If there are enough prefixes thus
        marked, the memory and computation savings could be significant. BGP
        sepakers which receive a prefix marked with NO_INSTALL, and which do
        not understand this community, simply ignore the community.</t>
      </section>

      <!-- end of the no install community -->
    </section>

    <!-- end of proposed enhancements -->

    <section title="The NO_BOUNDING Community">
      <t>In some situations, the originator of a longer length prefix might
      determine their routing will not work properly if their prefix is
      bounded at a point where it overlaps with a shorter prefix aggregate. To
      resolve this case, we propose a new transitive optional extended
      community, NO_BOUNDING.</t>

      <t>The NO_BOUNDING extended community consists of a type, to be
      determined through the IANA process, and a value containing the minimum
      AS Path length before which the route should not be bounded. If a BGP
      speaker determines a route could be bounded, but the route is marked
      with NO_BOUNDING, and the AS Path length is shorter than the minumum AS
      Path length noted in the NO_BOUNDING extended community, they speaker
      SHOULD NOT mark the route for bounding.</t>

      <t>This allows the originator of a prefix to control the bounding
      properties of the prefix.</t>
    </section>

    <section title="Benefits and Risks">
      <t>The benefits and risks associated with this proposal are discussed in
      the sections below.</t>

      <section title="Advantages to the Service Provider">
        <t>AS4, in each of the situations, reduces the number of prefixes
        carred through the autonomous system by the number of longer prefixes
        that overlap with aggregates of those prefixes. While one copy of the
        prefix continues to be carried through the autonomous system, this
        entry can be marked with the optional NO_INSTALL community, so it is
        not placed in the forwarding table, nor is it propogated outside the
        autonomous system.</t>

        <t>AS5 receives one prefix instead of two (or possibly more).</t>
      </section>

      <!-- end of advantages to the service provider -->

      <section title="Advantages to the Customer">
        <t>In this case, the customer is respresented as AS1. The customer
        will continue to receive some amount of traffic over both peering
        sessions, and dual homing through two Service Providers is still
        effective. If the customer's primary link fails, the alternate link
        through AS3 will take over receving all inbound traffic automatically.
        With most other schemes presented to this point, the customer loses
        all impact of dual-homing into the Internet, unless both connections
        are through one Service Provider.</t>
      </section>

      <!-- end of advantages to the customer -->

      <section title="Advantages to the Internet">
        <t>Beyond the second AS hop, aggregation is preserved in all cases.
        While this would not reduce the backbone routing table by the dramatic
        amounts that other methods might, the advantages to the community are
        great, and at greatly reduced risk to customers.</t>
      </section>

      <!-- end of advantages to the internet -->

      <section title="Implications for Router processing">
        <t>This proposal clearly adds to the work which needs to be done
        during overall [BGP] processing. Because a check needs to be done for
        both covered and covering routes, some part of this work is required
        for routes of lengths on either side of the bound. Should this become
        common, however, the rate of growth in the number of routes should be
        smaller and a balance should be struck between the extra processing
        per route and the number of routes.</t>
      </section>

      <!-- end of implications for router processing -->

      <section title="Implications for Traffic engineering">
        <t>The implementation of a bound risks magnifying or removing the
        effect of certain widely deployed traffic engineering methods. If, for
        example, an AS chose to prepend its own route to an announcement in
        order to alter the preference for that route, a BGP neighbor using a
        bounded longest match might now see that route as eligible for discard
        in favor of an aggregate. While it is fairly easy to code around that
        particular problem, to avoid this class of problems it might be
        preferable to allow this to apply to specific AS Sets as well as to
        all BGP neighbors.</t>
      </section>

      <!-- end of implications for traffic engineering -->

      <section title="Implications for Convergence Time">
        <t>If the route to the AS providing the route to the aggregate should
        be lost, the more-specific must propagate into the ASes which had
        formerly heard only the aggregate. This increases convergence time and
        may create situations in which reachability is temporarily
        compromised. Unlike the filter case, however, normal BGP behavior
        should restore reachability without changes to the router
        configuration. There is a also a risk that during a pathological event
        the increased processing required by this change will degrade
        propagation times during those events. This depends on both the speed
        of specific implementations and the character of the topology.</t>
      </section>

      <!-- end of implications for convergence time -->
    </section>

    <!-- end of benefits and risks -->

    <section title="Acknowledgements">
      <t>Cengiz Alaentinoglu, Alvaro Retana, Daniel Walton, David Ball, and
      Barry Greene gave valuable comments on this draft. Jeff Hass suggested
      the NO_BOUNDING community, along with the AS Path length limit described
      in the NO_BOUNDING section. A number of colleagues also gave the author
      valuable comments on the white board markings that gave rise to this
      paper; among them are Lane Patterson, Ian Cooper, Gerd Besch, Bill
      Norton, Diarmuid Flynn, and Sean Donelan.</t>
    </section>

    <!-- end of acknowledgements -->

    <section title="Security Considerations">
      <t>This document presumes that the implementation of bounded longest
      match is a knob inside a router config. Since the use of the knob
      affects route announcements not originating within the router's AS or
      its direct neighbors, the new behavior may result in surprises to the
      announcing AS. It is possible that this behavior might be considered a
      denial of service or mistaken for a denial of service by systems
      designed to detect black-holing on behalf of the origin AS.</t>
    </section>

    <!-- end of security considerations -->

    <section title="IANA Considerations">
      <t>This draft proposes three new communities, BOUNDED, NO_BOUNDING, and
      NO_INSTALL, for which new community values would need to be assigned.
      These should be assigned as described in EXT-COMM.</t>
    </section>

    <!-- end of iana considerations -->
  </middle>

  <back>
    <references title="Informative References">
      <?rfc include='reference.RFC.1771'?>

      <reference anchor="EXT-COMM">
        <front>
          <title>BGP Extended Communities Attribute</title>

          <author fullname="Srihari Sangli" initials="S" surname="Sangli">
            <organization></organization>
          </author>

          <author fullname="Dan Tappan" initials="D" surname="Tappan">
            <organization></organization>
          </author>

          <author fullname="Yakov  Rekhter" initials="Y" surname="Rekhter">
            <organization></organization>
          </author>

          <date day="31" month="January" year="2006" />

          <abstract>
            <t>This document describes an extension to BGP [BGP-4] which may
            be used to provide flexible control over the distribution of
            routing information.</t>
          </abstract>
        </front>

        <seriesInfo name="Internet-Draft"
                    value="draft-ietf-idr-bgp-ext-communities-09" />

        <format target="http://www.ietf.org/internet-drafts/draft-ietf-idr-bgp-ext-communities-09.txt"
                type="TXT" />
      </reference>

      <reference anchor="BGP-TABLE" title="BGP-TABLE">
        <front>
          <title abbrev="BGP-TABLE">Plenary, IETF 51.
          http://www.ietf.org/proceedings/01aug/</title>

          <author fullname="Randy Bush" initials="R." surname="Bush">
            <organization />

            <address>
              <postal />

              <street />

              <city />

              <region />

              <code />

              <country />

              <postal />

              <phone />

              <email />
            </address>
          </author>

          <date month="" year="" />

          <abstract />
        </front>
      </reference>
    </references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 07:44:33