One document matched: draft-bhatia-bgp-multiple-next-hops-00.txt


Internet Draft                                            February 2006 
 
 
   Network Working Group                                   Manav Bhatia 
   Internet Draft                             Riverstone Networks, Inc. 
                                                        Joel M. Halpern 
                                                             Paul Jakma 
                                                       Sun Microsystems 
   Expires: August 2006                               February 10, 2006 
    
                Advertising Multiple Nexthop Routes in BGP 
                                      
                draft-bhatia-bgp-multiple-next-hops-00.txt 
    
Status of this Memo 
    
   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that      
   other groups may also distribute working documents as Internet-
   Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
   This Internet draft will expire on August 2006 
    
Copyright Notice 
 
   Copyright (C) The Internet Society (2006). 
    
Abstract 
    
   This document describes an extensible mechanism that allows a BGP   
   speaker to advertise multiple BGP paths for a destination to its   
   peers, by describing a new BGP capability, termed "Multiple-Hop   
   Capability".   
        


 
 
Bhatia, Halpern and Jakma                                      [Page 1] 
Internet Draft                                            February 2006 
 
 
   The mechanisms described in this document are applicable to all 
   routers, both those with the ability to inject multiple routing 
   entries in their forwarding table and those without. 
    
Conventions used in this document 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED","MAY", and "OPTIONAL" in this  
   document are to be interpreted as described in RFC 2119 [KEYWORDS] 
    
Table of Contents 
    
   1. Introduction...................................................2 
   2. BGP Multiple Next Hop Scenarios................................3 
      2.1 Suboptimal Routing in Route Reflector clients..............3 
      Avoiding Persistent Route Oscillations.........................4 
      2.2 eBGP mesh scaling at IXes via Route Servers................7 
      2.3 Advertising a subset of routes in BGP......................8 
      2.4 Equal Cost Multiple Path BGP...............................8 
   3. Message Formats................................................8 
      3.1 Multiple-Hop Capability....................................9 
      3.2 Multiple-Hop attribute - MULTIPLE_HOP.....................11 
   4. Operation when both peers are Multiple-Hop capable............12 
      4.1 Advertisement of Multiple-Hop BGP routes..................12 
      4.2 Procedures for the Receiving Speaker......................13 
      4.3 Working with Multiple-Hop capable IBGP peers..............14 
   5. Multiprotocol Extensions to BGP...............................15 
   6. Security Considerations.......................................15 
   7. Acknowledgements..............................................15 
   8. IANA Considerations...........................................15 
   9. References....................................................16 
   10. Authors Address..............................................17 
    
1. Introduction 
    
   Currently BGP [BGP4] speakers cannot announce multiple paths, even if 
   it is desirable in certain scenarios.  This is because the BGP 
   specification allows only one "best" route to be inserted into the   
   Loc-RIB, and to be announced to other BGP speakers.  If another route   
   for a destination that has previously been announced to a BGP peer, 
   is sent later, then the receiver “implicitly withdraws” the former 
   route and replaces it with the new one. 
     
   Because of this, BGP speakers are thus, never able to advertise   
   multiple paths for the same destination to their peers. 
    
   Lifting this restriction would have benefit for at least the   
   following scenarios in BGP: 
    
 
 
Bhatia, Halpern and Jakma                                      [Page 2] 
Internet Draft                                            February 2006 
 
 
   o Persistent route-oscillation conditions in BGP [MED] 
    
   o eBGP mesh scaling at Internet Exchanges 
    
   o Interaction between ECMP capable BGP speakers 
    
   The first concerns route-reflectors [RR], where in certain 
   topologies, persistent route-oscillation conditions can arise due to 
   the clients of route-reflectors being never fully informed of each 
   others best paths, particularly where MED values are considered as 
   part of the best-path selection.  If BGP were to provide a means to 
   allow route-reflectors to share all the collective best-paths with   
   its clients, then these conditions could be alleviated, as we will   
   show below. 
    
   The second concerns scaling of eBGP meshes at Internet Exchanges   
   (referred to as an IX from now on, or IXes in the plural).  IX   
   operators have deployed eBGP route-servers, in a variety of guises,   
   in order to reduce the need for customers to establish direct   
   sessions with other customers.  These route-servers however have   
   severe limitations because of the single-path restriction in BGP. 
   Removing this limitation would allow for efficient deployment of IX   
   route-servers. 
    
   The third concerns BGP implementations which are capable of 
   considering multiple routes for inclusion into their RIB, and hence 
   likely their FIB, but do not have a way to relay the full resulting   
   state of their BGP RIB to their peers. 
    
   This document specifies the mechanism by which Multiple-Hop operates; 
   however it will not attempt to fully describe the usages.  In 
   particular this document anticipates that the ECMP scenario will be 
   described fully in another document, as it would have to be even if 
   documented without consideration of the Multiple-Hop capability.  It 
   is anticipated however that any speaker implementing the 
   functionality described in this document would be able to 
   interoperate with Multiple-Hop capable route-servers and route-
   reflectors, just as BGP speakers interoperate with Route-Reflectors 
   in the absence of the Multiple-Hop capability. 
    
    
2. BGP Multiple Next Hop Scenarios 
 
2.1 Suboptimal Routing in Route Reflector clients 
    
   Route Reflection can result in suboptimal routing due to the client   
   not having full visibility to all the BGP paths in the AS.  This is   
   because the RR selects the best path and reflects only that best path   
   to its clients.  In case the RR has equal cost BGP routes, then it   
 
 
Bhatia, Halpern and Jakma                                      [Page 3] 
Internet Draft                                            February 2006 
 
 
   shall select the one based on the lower Router ID.  As a result, the   
   clients do not receive the full view of the available paths, or at   
   least the paths that are equidistant from the RR.  This can result in   
   suboptimal routing from the client's perspective.  A client may have   
   selected a different best path if more paths had been made visible to   
   it.  With Multiple-hop BGP, the RR can advertise all the equal cost   
   BGP routes that it has to its client, giving the client more options   
   to choose from. 
    
   The extensions proposed in this draft provide provision for the RR to 
   reflect all the routes to its clients. 
 
   Avoiding Persistent Route Oscillations 
    
    
              ---------------------------------- 
           /                            AS X   \ 
          |              -----                  | 
          |            /       \                | 
          |           |         |               | 
          |           |   RR    |               | 
          |            \       /                | 
          |              -/+\-                  | 
          |           c1 /   \ c2               | 
          |     ----    /     \    ----         | 
          |   /      \ /       \ /      \       | 
          |  (  Ra    )         (   Rb   )      | 
          |   \      /           \      /       | 
          |     -/\--             ------        | 
          |     /  \                   \        | 
          |    /    \                   \       | 
          \   /     \                    \      / 
            --/------\--------------------\---- 
             /        \                    \ 
            /          --------------------------- 
            /        /  \                 --\--    \ 
         --/-       |   \               /       \  | 
       //    \\     |    \             |         | | 
      |   R2   |    |    \             |   R3    | | 
      |        |    |    -\--           \       /  | 
       \\    //     |  /      \           -----    | 
         ----       | |        |                   | 
         AS Y       | |   R1   |                   | 
                    |  \      /                    | 
                    |    ----                      | 
                    \                    AS Z      / 
                     ----------------------------- 
    
                          Figure 1 
 
 
Bhatia, Halpern and Jakma                                      [Page 4] 
Internet Draft                                            February 2006 
 
 
    
   Consider the topology as shown in Figure 1.  Say, AS X consists of 
   Route Reflector (RR) and two clients Ra and Rb.  Ra is connected to 
   R2 in AS Y and R1 in AS Z. Rb is connected to R3 in AS Z. Assume that 
   the Router ID of R1 < R2 and IGP cost c1 < c2.  The dashed lines 
   between the routers shows BGP peering.  Assume that the BGP speakers   
   in AS Y and AS Z receive a BGP UPDATE for 10.0.0.0/8 from AS W.   
   Assume that they advertise the following path attributes to BGP   
   speakers in AS X: 
    
   R2: NLRI 10.0.0.0/8, AS_PATH Y W, MED 100, NEXT_HOP R2 
    
   R1: NLRI 10.0.0.0/8, AS_PATH Z W, MED 300, NEXT_HOP R1 
    
   R3: NLRI 10.0.0.0/8, AS_PATH Z W, MED 200, NEXT_HOP R3 
    
   Scenario 1: Traditional BGP in AS X 
    
   The following events happen: 
    
   1. Ra receives UPDATEs from R2 and R1.  Since they are from 
      different ASes, MEDs are not compared and the tie breaks on the        
      lower Router ID.  Since R1 < R2, route from R1 is selected and         
      advertised to the RR.  Ra thus has the following path as the         
      best one for 10.0.0.0/8: 
    
      AS_PATH Z W, MED 300, NEXT_HOP R1 
    
   2. Rb receives the UPDATE from R3, installs this and advertises the 
      same to the RR.  Rb thus has the following path for 10.0.0.0/8: 
    
      AS_PATH Z W, MED 200, NEXT_HOP R3 
    
   3. RR receives two UPDATEs from its clients.  Since the neighboring 
      AS is the same in both of them, the tie breaks on the route 
      having the lower value of MED.  It thus selects the route it 
      learns from Rb as the best one and advertises this to Ra. 
    
   4. Ra now has all the three paths.  Route learnt from Rb wins over 
      the route learnt from R1 (lower MED) and the route learnt from 
      R2 wins over the route learnt from Rb (EBGP > IBGP). 
    
   5. Ra thus sends an implicit WITHDRAW to the RR, replacing the 
      earlier announcement with the route learnt from R2. 
    
   6. RR thus has the following paths for 10.0.0.0/8: 
    
      AS_PATH Y W, MED 100, NEXT_HOP R2 
      AS_PATH Z W, MED 200, NEXT_HOP R3 
 
 
Bhatia, Halpern and Jakma                                      [Page 5] 
Internet Draft                                            February 2006 
 
 
    
    
      It selects the first path because the IGP cost to reach the 
      NEXT_HOP (R2) is lesser for the first one.  It thus, advertises  
      this path to Rb and sends a WITHDRAW message to Ra, removing the  
      path it had initially announced (one learnt from Rb) 
    
   7. Ra receives the WITHDRAW message from the RR and removes the path.   
      Nothing is done as it is currently not the best path. 
    
   8. Rb receives the advertisement from RR, but doesn't do anything, as  
      the path learnt from R3 is better (EBGP > IBGP). 
    
   9. Ra at this time has only two routes.  One, learnt from R1 and the  
      other learnt from R2: 
    
      AS_PATH Z W, MED 300, NEXT_HOP R1 
    
      AS_PATH Y W, MED 100, NEXT_HOP R2 
    
      It has selected the route learnt from R2.  After some time, this 
      router runs its scanner process for validating the NEXT_HOPs. 
      There it runs the best path algorithm and finds that the route 
      learnt from R1 is better than the route learnt from R2, because 
      of the lower Router ID. 
    
   10.Ra sends an implicit WITHDRAW to RR, replacing the earlier  
      announcement with the route learnt from R2. 
    
   11... 
    
   The loop follows and it cycles again and again. 
    
   Scenario 2: Multiple-Hop BGP is implemented in AS X 
    
   1. If everything happens the same as in the preceding example then 
      Ra will have two paths to reach 10.0.0.0/8.  Since everything 
      else is the same, it will advertise both these routes to the RR. 
      Note that Ra will not look at the Router ID, etc. for tie 
      breaking if Multiple-Hop capabilities are implemented. 
    
   2. RR will now have three paths for 10.0.0.0/8.  Path 3, from Rb and 
      Paths 1 and 2 from Ra. 
    
      Path 1: AS_PATH Y W, MED 100, NEXT_HOP R2 
    
      Path 2: AS_PATH Z W, MED 300, NEXT_HOP R1 
    
      Path 3: AS_PATH Z W, MED 200, NEXT_HOP R3 
 
 
Bhatia, Halpern and Jakma                                      [Page 6] 
Internet Draft                                            February 2006 
 
 
    
      Out of Path 2 and Path 3, it will select Path 3 (lower MED).From  
      Path 1 and Path 3, it will select Path 1, based on the lower 
      IGP cost. RR thus selects the Path 1 as the best route. 
    
   3. RR will advertise the new path to Rb. Rb will thus have the  
      following two paths: 
    
      Path 1: AS_PATH Y W, MED 100, NEXT_HOP R2 
       
      Path 2: AS_PATH Z W, MED 200, NEXT_HOP R3 
       
      Path 2 will win because of the EBGP > IBGP rule, and it will 
      continue using R3. There is thus, no change on Rb and it 
      continues using the same path as before. 
       
   4. The network is stable and there are no route oscillations. 
    
2.2 eBGP mesh scaling at IXes via Route Servers 
    
   IXes today sometimes offer their customers the facility to peer with 
   a neutral IX route-server as a means to reduce the direct peering   
   requirements for their customers.  The peering overhead may be   
   considerable given the many hundreds of ASes which may be present at   
   some of the larger IXes today, and it is quite plausible that IXes   
   will continue to grow in terms of attached customers and ASes. 
    
   However, the single-path limitation of BGP imposes great operational   
   difficulty in allowing such a route-server to be effective. 
    
   There are typically two kinds of route-server, one which is a normal   
   BGP speaker and simply provides a single-best-path-for-all service,   
   and the type which are configured with each customer’s policies and   
   calculate the best-path separately for each.  Both approaches have 
   their limitations: 
    
   o  Route-servers which simply advertise the current best known IX  
      path according to normal BGP procedures, without applying any  
      customer-specific policy, require the customers to often still  
      establish direct sessions with each other for cases where they  
      wish to apply policy.  Much of the scaling benefits are never  
      realised. 
    
   o  Route-servers which apply policy on their customers behalf, 
      selecting the best-path on a per-customer basis and then 
      advertising each customer a tailor-made best-path, require 
      extensive co-ordination of policy between the IX operators and 
      each of their customers.  Further, it may be difficult for 
      customers to keep their policies private due the operational 
 
 
Bhatia, Halpern and Jakma                                      [Page 7] 
Internet Draft                                            February 2006 
 
 
      requirements of policy co-ordination between IX and customer. 
    
   If there were a mechanism in BGP to allow an IX route-server to pass   
   all other advertisements to a customer peer, without performing any   
   path selection or applying any policy, then this would remove the   
   need for policy co-ordination between each customer and the IX, and   
   address the other shortcomings listed above.  Such a mechanism would   
   be easy for both the IX operator and each customer to deploy and   
   maintain. 
    
2.3 Advertising a subset of routes in BGP 
    
   Providers can tag some selected routes with certain communities 
   [COMM]. An administrator could write a policy that would advertise 
   all the paths carrying a known community within that AS to another 
   router capable of understanding the Multiple-Hop extensions.  This is 
   a form of policy implementation and a detailed study of what could be 
   achieved using such techniques is beyond the scope of this draft. 
 
2.4 Equal Cost Multiple Path BGP 
 
   Currently some implementations, when they receive multiple equal cost   
   BGP routes from different peers, are able to insert all of them (or a   
   subset of those, based on their local policies) in their forwarding   
   table to locally split the load for the destination, while announcing   
   only one "best" BGP path to its other peers.  This however has   
   implications for those other peers which receive such an announcement   
   from this ECMP capable BGP speaker.  The implication, as per route   
   aggregation, is these other peers potentially will not posses the   
   full path information, which can lead to loops.  Hence, such an ECMP   
   capable BGP speaker can only enable this feature if great care is   
   taken, if at all, or must act as if it had aggregated the set of   
   routes concerned. 
    
   While this document does not directly address the question of ECMP,   
   the mechanism introduced can be built upon in order to do so.  It   
   would be feasible to introduce additional semantics on top of the   
   Multiple-Nexthop Capability so as to allow the ECMP BGP speaker to   
   fully communicate the details of all the paths it is forwarding on,   
   and hence allow those other peers to have full visibility of path   
   information and be able to avoid selecting paths which would   
   otherwise loop, while still maintaining compatibility with speakers   
   not implementing ECMP and Multiple-Hop. 
    
3. Message Formats 
    
   Encoding given below is, as per normal BGP , in network or big-endian 
   "byte order", with octets of a multiple-octet value defined and 
   encoded in order of significance, from highest order first to lowest, 
 
 
Bhatia, Halpern and Jakma                                      [Page 8] 
Internet Draft                                            February 2006 
 
 
   and with each bit within an octet similarly defined and encoded in 
   order of significance, highest order first to lowest. Bit field 
   definitions are specified from left to right, in order of   
   significance, from the highest order bit specified left-most to the   
   lowest order bit specified right-most. 
    
3.1 Multiple-Hop Capability 
    
   To advertise the Multiple-Hop Capability to a peer, a BGP speaker   
   uses BGP Capabilities Advertisement [BGP-CAP].  This capability is   
   advertised using one or more capabilities with some Capability code   
   (TBD) and a variable Capability length.  By advertising the Multiple-
   Hop Capability to a peer, a BGP speaker conveys to the peer that the   
   speaker is capable of receiving and properly handling the Multiple- 
   Hop updates from that peer. 
    
   The capability data consists of the two normal capability attribute   
   fields followed by a triplet of (AFI,SAFI,flags) [1] [2] indicating 
   for which (AFI,SAFI) pairs the speaker supports Multiple-Hop, along 
   with a set of flags specific to the Multiple-Hop capability and the   
   (AFI,SAFI) tuple concerned.  A speaker MUST include a separate   
   capability parameter for each distinct (AFI,SAFI) for which it wishes   
   to negotiate the Multiple-Hop capability, including a distinct   
   (AFI,SAFI,flags) triplet as the capability data for each (AFI,SAFI)   
   concerned.  Multiple-Hop capability is NOT supported for any   
   (AFI,SAFI) tuples for which a Multiple-Hop capability and appropriate   
   triplet of data is not received. 
    
   Each triplet is encoded as: 
    
   +-------+-----------------------------------+-----------------------+   
   | Field |              Meaning              |     Size of field     |   
   |       |                                   |        (octets)       |   
   +-------+-----------------------------------+-----------------------+   
   |  AFI  |     Address Family Identifier     |           2           |    
   +-------+-----------------------------------+-----------------------+ 
   |  SAFI |     Subsequent Address Family     |           1           | 
   +-------+-----------------------------------+-----------------------+ 
   |       |             Identifier            |                       |   
   | Flags |   (AFI,SAFI) Multiple-Hop flags   |           1           |   
   +-------+-----------------------------------+-----------------------+ 
                                 Table 1 
    
    
    
    
    
    
    
 
 
Bhatia, Halpern and Jakma                                      [Page 9] 
Internet Draft                                            February 2006 
 
 
   The final octet of data in the triplet is a bitmask of flags: 
    
               +-------+---+---+---+---+---+---+---+----+ 
               | Bit:  | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0  | 
               +-------+---+---+---+---+---+---+---+----+ 
               | flag: | R | R | R | R | R | R | R | AE | 
               +-------+---+---+---+---+---+---+---+----+ 
                                 Table 2 
    
   The (AFI,SAFI) flags (Table 1) are defined as: 
    
   R  Reserved: 
    
      MUST be 0.  Without further knowledge beyond this document a 
      speaker MUST treat as a capability negotiation error [BGP-CAP] the 
      case where it receives a Multiple-Hop capability advertisement 
      with a reserved flag set. 
    
   AE Advertise-Extra: 
    
      Indicates the speaker intends to advertise additional paths, other 
      than just its best path.  This capability is asymmetric and any 
      speaker asserting this flag MUST treat the case where the remote 
      speaker also asserts this flag as a capability negotiation error. 
      Further, a speaker MAY at its discretion treat as a 
      capability negotiation error the case where neither itself nor the 
      remote speaker assert this flag (e.g. because the speaker has no 
      other use for this capability other than acting as an Multiple-Hop 
      capable client of a Route-Server or Route-Reflector, other uses 
      such as ECMP). 
    
   Each distinct (AFI,SAFI) specific Multiple-Hop capability parameter  
   is therefore encoded as: 
    
   +---------------+-------------------+---------------------+---------+ 
   |     Field     |   Size of field   |       Meaning       |  Value  | 
   |               |      (octets)     |                     |         | 
   +---------------+-------------------+---------------------+---------+ 
   |   Capability  |         1         |     Multiple-Hop    |   TBD   | 
   |      Code     |                   |      Capability     |         | 
   +---------------+-------------------+---------------------+---------+ 
   |   Capability  |         1         |    Octets of data   |Variable | 
   |     Length    |                   |                     |         | 
   +---------------+-------------------+---------------------+---------+ 
   |    Triplet    |         4         |   Encoded as above  |    As   |   
   |               |                   |                     |  above  |   
   +---------------+-------------------+---------------------+---------+ 
                                  Table 3 
    
 
 
Bhatia, Halpern and Jakma                                      [Page 10] 
Internet Draft                                            February 2006 
 
 
    
3.2 Multiple-Hop attribute - MULTIPLE_HOP 
    
   To provide backward compatibility, as well as to simplify   
   introduction of the Multiple-Hop capabilities into BGP, a new BGP   
   attribute, MULTIPLE_HOP is introduced.  This attribute is an optional   
   and non-transitive attribute that can be used for advertising   
   multiple next-hops associated with a NLRI. 
    
   The attribute data contains one or more triplets of (AFI,SAFI, List   
   of Next Hop Information), where each triplet is encoded as shown   
   below: 
    
    
               +------------------------------------------------+ 
               |      Address Family Identifier (2 octets)      | 
               +------------------------------------------------+ 
               | Subsequent Address Family Identifier (1 octet) | 
               +------------------------------------------------+ 
               |          Number of Next Hops (1 octet)         | 
               +------------------------------------------------+ 
               |     Length of the First Next Hop (1 octet)     | 
               +------------------------------------------------+ 
               |  Network Address of First Next Hop (variable)  | 
               +------------------------------------------------+ 
               |     Length of the Second Next Hop (1 octet)    | 
               +------------------------------------------------+ 
               |  Network Address of Second Next Hop (variable) | 
               +------------------------------------------------+ 
               |                      . . .                     | 
               |                      . . .                     | 
               +------------------------------------------------+ 
               |      Length of the Nth Next Hop (1 octet)      | 
               +------------------------------------------------+ 
               |   Network Address of Nth Next Hop (variable)   | 
               +------------------------------------------------+ 
    
                                     Table 4 
    
   The MULTIPLE_HOP fields (Table 4) are defined as follows: 
    
   Address Family Identifier: The AFI field carries the identity of      
   the Network Layer protocol associated with the Network Address      
   that follows. 
    
   Subsequent Address Family Identifier: The SAFI field in      
   combination with the Address Family Identifier field identifies      
   the Network Layer context associated with the Network Address of      
   the Next Hop(s). 
 
 
Bhatia, Halpern and Jakma                                      [Page 11] 
Internet Draft                                            February 2006 
 
 
    
   Number of Next-Hops: This field carries the total number of Multiple-      
   Hop BGP routes for the given NLRI. 
    
   Length of Nth Next Hop Network Address: A 1 octet field whose value      
   expresses the length of the "Network Address of Next Hop" field as      
   measured in octets.  For IPv6 routes the value shall be set to 16,      
   when only a global address is present, or 32 if a link-local      
   address is also included in the Next Hop field [BGP-IPv6]. 
    
   Network Address of Nth Next Hop: This is a variable length field that      
   contains the Network Address of the next router on the path to the      
   destination. 
    
   The N next-hops listed in the MULTIPLE_HOP path attribute define the   
   Network Layer address of the routers that should be used as next-hops   
   to the destinations listed in the UPDATE message. 
    
4. Operation when both peers are Multiple-Hop capable 
 
   In the following sections, "Local speaker" refers to a router which 
   is advertising the BGP Multiple-Hop routes, and the "Receiving   
   Speaker" refers to a router that peers with the former to accept   
   multiple BGP routes for a destination. 
    
   Consider that the Multiple-Hop Capability has been exchanged between   
   the Local speaker and the Receiving speaker, and a BGP session   
   between them is established.  The following sections detail the   
   procedures that shall be followed by the Local speaker as well as the   
   Receiving speaker once the Multiple-Hop capability has been   
   exchanged, and the local speaker wants to advertise some BGP   
   Multiple-Hop routes. 
    
   Note that for operation within the confines of this document and BGP   
   the Local Speaker almost certainly will be acting as an eBGP Route-
   Server or iBGP Route-Reflector, asserting the Advertise-Extra flag in 
   the Multiple-Hop capability triplet for the (AFI,SAFI) tuples 
   concerned, and the Receiving Speaker therefore acting as a client of   
   that speaker. 
    
   Other uses, such as ECMP speakers exchanging Multiple-Hop routes will   
   require further consideration, not addressed in this document as   
   stated previously, considerations not per se related to the Multiple-   
   Hop capability itself. 
    
4.1 Advertisement of Multiple-Hop BGP routes 
 
   Between Multiple-Hop capable speakers, the MULTIPLE_HOP attribute   
   MUST be used in addition to the existing NEXT_HOP in order to   
 
 
Bhatia, Halpern and Jakma                                      [Page 12] 
Internet Draft                                            February 2006 
 
 
   announce multiple next-hops for the destinations listed in the   
   Network Layer Reachability Information of the UPDATE message. If the   
   speaker has installed one of the next-hops concerned in its RIB, then   
   that particular next-hop MUST be listed in the NEXT_HOP attribute. 
       
   All prefixes announced using this attribute MUST NOT replace the   
   previous advertisements and thus, multiple BGP paths for a prefix can   
   be advertised by the Local Speaker.  If the same prefix is later   
   announced with ONLY the NEXT_HOP attribute then it MUST be taken as   
   an implicit withdraw for all the previous paths advertised by that   
   peer for that destination. 
    
   An UPDATE message which contains feasible routes and carries   
   MULTIPLE_HOP and no NEXT_HOP attribute MUST NOT be considered as an   
   implicit withdrawal.  The Receiving Speaker MUST simply append these   
   routes in its Adj-RIBs-In [BGP4], as additional paths to that   
   destination. 
    
   If some attributes (LocPref, MED, etc) change for a previously 
   announced BGP Multiple-Hop route, then an explicit withdraw message   
   MUST be sent to all the peers to whom this route had been earlier   
   announced, and the route reannounced in full.  
    
   When advertising multiple paths which do not have identical   
   attributes, multiple BGP updates must be sent with the MULTIPLE_HOP   
   attribute included to suppress route replacement, one UPDATE message   
   per set of distinct path attributes, with their corresponding next-   
   hops. 
    
4.2 Procedures for the Receiving Speaker 
    
   The Receiving Speaker upon receiving the MULTIPLE_HOP attribute will   
   understand that the Local Speaker has advertised Multiple-Hop BGP   
   routes.  Within a single UPDATE message all the prefixes will have   
   identical attributes, except for the next-hops, which will be carried   
   in the MULTIPLE_HOP attribute. 
    
   A series of further UPDATEs for the same NLRI, with or without the   
   same set of attributes, which contain the MULTIPLE_HOP attribute will   
   be understood to be additive, each UPDATE appending these additional   
   feasible routes, to the appropriate Adj-RIB-In, where after the 
   receiving speaker may run its normal decision process to select the 
   best path to install to its Local-RIB.  
    
   Upon receiving an UPDATE for the same NLRI, without a MULTIPLE_HOP   
   attribute, the speaker will understand this to be an implicit   
   withdraw of any previously received routes for the NRLI concerned,   
   and replace all previous announcements stored in the Adj-RIB-In with   
   the new UPDATE.  
 
 
Bhatia, Halpern and Jakma                                      [Page 13] 
Internet Draft                                            February 2006 
 
 
    
   If the Receiving Peer receives some withdrawn routes along with the   
   other path attributes and MULTIPLE_HOP attribute then it shall   
   understand that some of the previously advertised Multiple-Hop BGP   
   routes have been removed and an implementation MUST proceed with   
   removing all such paths. 
    
   If a BGP speaker wants to withdraw all the Multiple-Hop BGP routes   
   for a particular destination then it can send a normal BGP UPDATE   
   message listing the NLRI in the WITHDRAWN routes field.  An   
   implementation on the Receiving Speaker MUST, then remove all the   
   Multiple-Hop BGP routes for that destination which it heard from the   
   Local speaker. 
    
   If the Receiving Speaker receives an UPDATE message with the   
   MULTIPLE_HOP attribute containing both, the feasible and the   
   unfeasible routes, then it MUST consider these attributes for the   
   feasible routes.  All the destinations listed in the withdrawn routes 
   shall be removed as per. 
    
4.3 Working with Multiple-Hop capable IBGP peers 
 
   This section explains how multiple-hop feature will work in the 
   normal scenarios. 
    
   Assume that the two IBGP speakers A and B exchange this capability.   
   Consider a case where A receives multiple updates for NLRI N' with   
   Nexthops N0, ..  Ni, ..  Nm. Assume that A wants to advertise all 
   these routes to B. Also assume that Nj and Nk share the same path 
   attributes (Origin, AS Path, Local Pref, etc). 
    
   A makes an UPDATE message and uses the MULTIPLE_HOP path attribute.   
   It puts the AFI, number of next-hops as 2, length of the first next-   
   hop (Nj), network address of Nj, length of Nk and the network address   
   of Nk. 
    
   When this UPDATE message is received by B, it looks at the   
   MULTIPLE_HOP path attribute and understands that there are multiple   
   routes to reach N'. It inserts two routes for N' with the next-hops   
   as Nj and Nk. 
    
   A also needs to announce N' with some other path attributes and the   
   next-hop Nl.  It makes an UPDATE message, puts the path attributes,   
   and puts the MULTIPLE_HOP path attribute.  It fills the AFI, number   
   of next-hops as 1, length of the first next-hop Nl and the network   
   address of Nl.  This UPDATE message is sent to B. 
    
   When B receives this UPDATE message it knows that this is not an   
   implicit WITHDRAW from N' as it comes with the MULTIPLE_HOP path   
 
 
Bhatia, Halpern and Jakma                                      [Page 14] 
Internet Draft                                            February 2006 
 
 
   attribute. It simply appends this new route in its BGP database,   
   runs the decision process, and proceeds as normal. 
    
   Assume that at some point later, A needs to withdraw the route   
   associated with the tuple [N', Nk]. It makes an UPDATE message, puts   
   N' in the unfeasible routes and inserts path attributes and the   
   MULTIPLE_HOP path attribute, keeping the next-hop inside as Nk. 
    
   When B receives this UPDATE message it understands that A now wants   
   to remove a route associated with N'. It looks at MULTIPLE_HOP and   
   finds the next-hop as Nk.  It thus removes, only the route associated   
   with Nk. 
    
5. Multiprotocol Extensions to BGP 
 
   Since the MULTIPLE_HOP includes both the AFI and SAFI, it is possible   
   to advertise MPBGP Multiple-Hop routes.  In this case, MP_REACH_NLRI   
   [MBGP] path attribute shall carry the NLRI information and   
   MULTIPLE_HOP the information about the additional next-hops. 
    
    
6. Security Considerations 
    
   This extension to BGP does not change the underlying security issues   
   inherent in the existing BGP. 
    
    
7. Acknowledgements 
    
   The authors would like to thank Tony Li, Arnold Nipper and Curtis 
   Villamizar for their valuable comments and suggestions on the earlier 
   versions of this draft from which the current work has been derived. 
    
8. IANA Considerations 
    
   This document requires the creation and maintenance of a Multiple-Hop   
   Capability Flags registry and the following assignments from IANA   
   from this and other, existing, IANA registries by IANA: 











 
 
Bhatia, Halpern and Jakma                                      [Page 15] 
Internet Draft                                            February 2006 
 
 
    
    
   +----------------+-----------------------+-----------+--------------+   
   | IANA registry  | Symbol                | Assigned  | Reference    |   
   |                |                       | value     |              |   
   +----------------+-----------------------+-----------+--------------+   
   | BGP Capability | Multiple-Hop          | TBD       | 2842bis      |   
   | Codes          | capability code       |           | [BGP-CAP]    | 
   +----------------+-----------------------+-----------+--------------+ 
   | BGP Path       | MULTIPLE_HOP          | TBD       | 1771bis      |   
   | Attributes     | attribute type code   |           | [BGP4]       | 
   +----------------+-----------------------+-----------+--------------+ 
   | BGP            | Advertise-Extra       | Bit 0     | This         |   
   | Multiple-Hop   | Multiple-Hop Flag     |           | document     |   
   | Flags          |                       |           |              |   
   +----------------+-----------------------+-----------+--------------+ 
                                 Table 5 
    
9. References 
 
   [BGP-CAP]  Chandra, R. and J. Scudder, "Capabilities Advertisement 
              with BGP-4", RFC 3392, November 2002 
    
   [BGP4]     Rekhter, Y., Li, T. and Hares, S., "A Border Gateway  
              Protocol 4 (BGP-4)", RFC 4271, March 1995 
    
   [MED]      Retana, A., Walton, D., McPherson, D., and V. Gill, 
              "Border Gateway Protocol (BGP) Persistent Route 
              Oscillation Condition", RFC 3345, August 2002. 
    
   [RR]       Chandra, R., Bates, T., and E. Chen, "BGP Route Reflection 
              - An Alternative to Full Mesh IBGP", 
              draft-ietf-idr-rfc2796bis-02 (work in progress), 
              October 2005 
    
   [COMM]     Chandra, R., Trania, P. and Li, T.,”BGP Communities  
              Attribute”, RFC 1997, August 1996 
    
   [BGP-IPv6] Marques, P. and F. Dupont, "Use of BGP-4 Multiprotocol 
              Extensions for IPv6 Inter-Domain Routing", RFC 2545, 
              March 1999. 
    
   [CONFED]   McPherson, D., Scudder, J., and P. Traina, "Autonomous 
              System Confederations for BGP", 
              draft-ietf-idr-rfc3065bis-05 (work in progress), 
              October 2005. 
    
   [MBGP]     Chandra, R., Rekhter, Y., Bates, T., and D. Katz, 
              "Multiprotocol Extension for BGP-4", 
 
 
Bhatia, Halpern and Jakma                                      [Page 16] 
Internet Draft                                            February 2006 
 
 
              draft-ietf-idr-rfc2858bis-08 (work in progress) 
    
   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate 
              Requirement Levels", RFC 2119, BCP 14, February 2001. 
    
   [1]  http://www.iana.org/assignments/address-family-numbers 
    
   [2]  http://www.iana.org/assignments/safi-namespace 
 
 
10. Author's Address 
 
   Manav Bhatia 
   Riverstone Networks, Inc. 
    
   Email: manav@riverstonenet.com 
    
   Joel M. Halpern 
    
   Email: joel@stevecrocker.com 
    
   Paul Jakma 
   Sun Microsystems 
    
   Email: paul.jakma@sun.com 
    
   Intellectual Property Statement 
    
   The IETF takes no position regarding the validity or scope of any   
   Intellectual Property Rights or other rights that might be claimed to   
   pertain to the implementation or use of the technology described in   
   this document or the extent to which any license under such rights   
   might or might not be available; nor does it represent that it has   
   made any independent effort to identify any such rights.  Information   
   on the procedures with respect to rights in RFC documents can be   
   found in BCP 78 and BCP 79. 
    
   Copies of IPR disclosures made to the IETF Secretariat and any   
   assurances of licenses to be made available, or the result of an   
   attempt made to obtain a general license or permission for the use of   
   such proprietary rights by implementers or users of this   
   specification can be obtained from the IETF on-line IPR repository at   
   http://www.ietf.org/ipr. 
    
   The IETF invites any interested party to bring to its attention any   
   copyrights, patents or patent applications, or other proprietary   
   rights that may cover technology that may be required to implement   
   this standard.  Please address the information to the IETF at   ietf-
   ipr@ietf.org. 
 
 
Bhatia, Halpern and Jakma                                      [Page 17] 
Internet Draft                                            February 2006 
 
 
    
    
   Disclaimer of Validity 
    
   This document and the information contained herein are provided on an   
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS   
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET   
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,   
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE   
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED   
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
    
   Copyright Statement 
    
   Copyright (C) The Internet Society (2006).  This document is subject   
   to the rights, licenses and restrictions contained in BCP 78, and   
   except as set forth therein, the authors retain all their rights. 
    
   Acknowledgment 
    
   Funding for the RFC Editor function is currently provided by the   
   Internet Society. 
    
    
    
    
    
    
    
    
    
    
    
    
    
    













 
 
Bhatia, Halpern and Jakma                                      [Page 18] 


PAFTECH AB 2003-20262026-04-24 07:31:43