One document matched: draft-dunbar-trill-directory-assisted-edge-00.txt


TRILL working group                                         L. Dunbar  
Internet Draft						  D. Eastlake
Intended status: Standard Track                                Huawei
Expires: Sept 2012                                      Radia Perlman
                                                                Intel
                                                         I. Gashinsky
                                                                Yahoo
                                                          July 4, 2011


                      Directory Assisted RBridge edge
             draft-dunbar-trill-directory-assisted-edge-00.txt


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with
   the provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on January 4, 2009.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document.  Code Components extracted from this
   document must include Simplified BSD License text as described in




Dunbar                 Expires January 4, 2012                [Page 1]

Internet-Draft     Directory Assisted RBridge edge          March 2011


   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the BSD License.

Abstract

   RBridge edge nodes currently learn the mapping between MAC address
   and its corresponding RBridge edge node address by observing the
   data packets traversed through.

   This document describes why and how directory assisted RBridge edge
   nodes can improve TRILL network scalability in data center
   environment.

Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC-2119 0.

Table of Contents


   1. Introduction ................................................ 3
   2. Terminology ................................................. 3
   3. Impact to RBridge Network by massive number of hosts in Data
   Center(s) ...................................................... 3
   4. Directory Assistance for RBridge Edge node in Data Center
   environment. ................................................... 5
   5. Further optimization in using directory assistance for RBridge in
   data center .................................................... 7
      5.1. TRILL Header encapsulated by non-RBridge nodes RBridge                                                                     .. 10
   6. Conclusion and Recommendation............................... 10
   7. Manageability Considerations................................ 11
   8. Security Considerations..................................... 11
   9. IANA Considerations ........................................ 11
   10. Acknowledgments ........................................... 11
   11. References ................................................ 11
   Authors' Addresses ............................................ 12
   Intellectual Property Statement................................ 12
   Disclaimer of Validity ........................................ 13









Dunbar                  Expires Sept4, 2012                  [Page 2]

Internet-Draft     Directory Assisted RBridge edge          March 2011


1. Introduction

   Data center networks are different from campus networks in several
   ways. Main differences include:

      o Data centers, especially Internet or cloud data centers with
         virtualized servers, tend to have large number of hosts
      o Topology is based on racks, rows.
           -  Hosts assignment to Servers, Racks, and Rows is
              orchestrated by Server/VM Management system, not random.
      o With virtualization, there is an ever increasing trend to
         dynamically create VMs when the application requires more
         resources, and move VMs, either from overloaded servers, or to
         aggregate VMs onto fewer servers to save power when demand is
         light. This may lead to hosts belonging to same subnet being
         placed under different locations (racks or rows).

   This draft describes why and how Data Center TRILL networks can be
   optimized by utilizing directory assisted approach.

2. Terminology

   AF      Appointed Forwarder RBridge port

   Bridge:  IEEE802.1Q compliant device. In this draft, Bridge is used
             interchangeably with Layer 2 switch.

   DC:      Data Center

   EoR:    End of Row switches in data center.

   FDB:    Filtering Database for Bridge or Layer 2 switch

   ToR:    Top of Rack Switch in data center. It is also known as
             access switch.

   VM:     Virtual Machines

3. Impact to RBridge Network by massive number of hosts in Data
   Center(s)

   In a virtualized data center, a VM may be placed on any physical
   server. A variety of algorithms can be applied to select the
   location of a VM. Resource aware algorithms (e.g. energy, bandwidth,
   etc,) tend to use a placement that satisfies the processing



Dunbar                  Expires Sept4, 2012                  [Page 3]

Internet-Draft     Directory Assisted RBridge edge          March 2011


   requirements of each VM but require the minimal number of physical
   servers and switching devices.

   With this, and similar types of assignment algorithm, subnets tend
   to extend throughout the network.  When this happens, the broadcast
   messages within each subnet will be flooded across the RBridge
   domain, which not only consumes bandwidth on links in RBridge
   domain,   but also causes an RBridge edge port to learn all the
   hosts belonging to all the subnets which are enabled on the port.
   Even though an RBridge edge port is only supposed to learn the
   MAC&VLAN <-> RBridge edge mapping for remote hosts which communicate
   with local hosts, the frequent ARP/ND from all hosts within each
   subnet will normally refresh the RBridge edge node's MAC&VLAN<-
   >RBridge-Edge mapping table.

   Consider a data center with 80 rows, 8 racks per row and 40 servers
   per rack.  There can be 80*8*40=25600 servers. Suppose each server
   is virtualized to 20 VMs, there could be 25600*20=512000 hosts in
   this data center.

   Let's consider following two scenarios:

        Scenario #1: RBridge edge starts at TOR switches:

         With 80 rows and 8 racks per row, there will be 80*8 = 640 Edge
         RBridges, with each Edge RBridge supporting 40 RBridge edge
         ports (facing the servers) and 8 RBridge trunk ports facing EoR
         switches. Then there are 40*640 = 25600 RBridge edge ports in
         this data center.

         If each rack and row has two redundant switches, then there
         will be 640*2=1280 RBridge edge nodes and 80*2=160 RBridge core
         nodes. Total number of nodes in this RBridge domain could be
         1440 (1280+160) plus some core switches which interconnect all
         the EoR switches   very large number of nodes in this RBridge
         domain.

        Scenario #2: RBridge edge starts at the End of Row switches:

         With the same assumption as before, there will be 80 Edge
         RBridges in the RBridge domain. Even with redundancy, the
         number of nodes in RBridge domain will be less than 200.
         Therefore, the size of the RBridge IS/IS domain is reasonable.




Dunbar                  Expires Sept4, 2012                  [Page 4]

Internet-Draft     Directory Assisted RBridge edge          March 2011


         However, since there is a Bridged LAN attached to RBridge edge
         port(s), it becomes necessary to have some mechanisms to
         prevent loops within the Bridged LAN attached to RBridge edge
         ports. Running traditional Layer 2 STP/RSTP on the Bridged LAN
         for loop prevention may be overkill because the topology among
         the ToR switches and RBridge edge is very simple. [[ If a
         ''mechanism'' is necessary and STP/RSTP is overkill, what
         mechanism are you proposing? [Linda: proposing to us the
         directory assistance to avoid loops]]

         When multiple Rbridge ports are accessible from a server
         (hosts/VMs), only one of them can forward native traffic (AF
         port) which has the following associated issues:

           o Some ToR switches, even though having direct links to Edge
              RBridge ports, may need to re-route traffic to other ToR
              switches because the directly connected ports are not AF
              ports. This re-route not only waste links between ToR
              switches, but also put heavy load on the AF port.
           o Proper port ID needs to be selected to each port to
              achieve load balancing among multiple RBridge ports for
              each VLAN.  When AF changes, traffic temporarily goes to
              black hole.

         In addition, the number of MAC&VLAN<->RBridge Edge Mappings to
         be learned and managed by RBridge edge node can be very large.
         In the example above, each Edge RBridge has 8 RBridge edge
         ports facing the ToR switches. Since each ToR has 40 downstream
         ports facing servers and each server has 20 VMs, there are
         40*20 = 800 hosts attached to each downstream port of an EoR
         switch and total of 8*800=6400 hosts attached to this EoR
         switch. If all those 6400 hosts belong to 640 VLANs and each
         VLAN has 200 hosts, then, under the worst case scenario, the
         total number of MAC&VLAN entries to be learned by the RBridge
         edge (i.e. EoR) can be 640*200=128000. You can easily see that
         the number of MAC&VLAN<->RBridge Edge mapping entries to be
         learnt by the RBridge edge node can be very large.



4. Directory Assistance for RBridge Edge node in Data Center
   environment.

   In data center environment, the hosts (VMs) placement to servers,
   racks, and rows is orchestrated by Server (or VM) Management


Dunbar                  Expires Sept4, 2012                  [Page 5]

Internet-Draft     Directory Assisted RBridge edge          March 2011


   System(s), i.e. there is a database or multiple ones (distributed
   model) which have the knowledge of where each host (VM) is placed.
   If RBridge edge nodes can utilize the information of where each host
   is located, then the flooding process to learn the mapping between
   MAC&VLAN and corresponding RBridge Edge node can be eliminated. This
   is a great optimization, especially in virtualized data center
   environment where VMs migrate all the time. If migrated VMs send out
   gratuitous ARP (IPv4) or Unsolicited Neighbor Advertisement (IPv6)
   from the new location, those gratuitous broadcast messages have to
   flood to all other RBridge edge nodes. If migrated VMs don't send
   out gratuitous ARP (or ND) from the new location, for packets
   towards those migrated VMs the ingress RBridge edge nodes will send
   them to the wrong egress RBridge edge nodes, which is also waste of
   bandwidth.

   The benefits of using directory assistance include:

        The Directory enforced MAC&VLAN <-> RBridge Edge mapping table
         can determine if a frame needs to be forwarded across RBridge
         domain.
        When multiple Rbridge ports are accessible from a server
         (hosts/VMs), Directory assistance can enable all RBridge
         ingress ports to forward traffic.
        Directory assisted approach can not only eliminate the flooding
         within RBridge domain (unknown learning), but also reduce the
         flooding on the bridged LAN attached to RBridge edge ports.
        Reduce the amount of MAC&VLAN <-> RBridge edge mapping
         maintained by RBridge edge.


   There can be two different models for RBridge edge node to be
   assisted by Directory:

       Push Model:

         Directory Server(s) push down the MAC&VLAN <-> RBridge Edge
         mapping for all the hosts belonging to all the VLANs enabled
         on the RBridge edge node.

         Whenever there is any change in MAC&VLAN <-> RBridge Edge
         mapping, which can be triggered by hosts being moved, de-
         commissioned, or temporarily out of service due to
         maintenance, an incremental update can be sent to the RBridge
         edge nodes which are impacted by the change.



Dunbar                  Expires Sept4, 2012                  [Page 6]

Internet-Draft     Directory Assisted RBridge edge          March 2011


         Under this model, if we can assume that the data center's
         directory service has MAC&VLAN <-> RBridge Edge mapping for
         all legal hosts in the data center, then RBridge edge node
         should not receive any data frame with unknown address. Under
         this scenario, it is recommended for RBridge edge node not to
         flood the data frame to RBridge domain if the destination
         address can't be found in the MAC&VLAN<->RBridge Edge mapping
         table.

       Pull model:

         Under this model, RBridge edge node can simply intercept all
         ARP requests and forward them to the Directory Server(s) which
         has the information of how each MAC&VLAN is mapped to its
         corresponding RBridge edge node.

         The reply from the Directory Server can be the standard ARP
         reply with an extra field showing the RBridge egress node
         address

         RBridge ingress node can cache the mapping

         If RBridge edge node receives an unknown MAC-DA, it could
         choose not to flood the data frame to RBridge domain as in the
         Push Model, or it can query the directory server. If there is
         no response from the directory server, the RBridge edge node
         can choose not to flood the frame to RBridge domain.

5. Further optimization in using directory assistance for RBridge in
   data center

   The topology between EoR switches and ToR switches can be very
   simple in data center environment. Under those simple topology
   environments, having the ToR switches participating in RBridge's
   IS/IS routing domain may not provide much value in topology
   discovery. By eliminating ToR switches from RBridge routing domain,
   the number of nodes in the RBridge routing domain can be greatly
   reduced, which in turns can make the network scale better.

   However, when the RBridge edge starts at the EoR switches and there
   is a bridged LAN with multiple ToR switches attached to RBridge Edge
   port(s), it becomes necessary to have some mechanisms to prevent
   loops. Running traditional Layer 2 STP/RSTP on the Bridged LAN may
   be overkill because the topology among the ToR switches and RBridge
   edge is very simple. [[ Same comment as above. If a loop prevention
   mechanism is necessary and STP/RSTP is overkill, what are you going


Dunbar                  Expires Sept4, 2012                  [Page 7]

Internet-Draft     Directory Assisted RBridge edge          March 2011


   to use? [Linda: suggest using directory assistance to avoid loops
   instead of STP/RSTP]]

   When multiple Rbridge ports are accessible from a server
   (hosts/VMs), only one of them can forward native traffic (AF port).
   Some ToR switches, even though having direct links to Edge RBridge
   ports, may need to re-route traffic from hosts to other ToR switches
   because the directly connected ports are not AF ports. This re-route
   not only waste links between ToR switches, but also put heavy load
   on the AF port.

   In addition, the MAC&VLAN <-> RBridge Edge mapping to be maintained
   by EoR switch can be very huge. As shown in the Scenario #2 of
   Section 3, the number of entries could potentially reach 128000.
   That is a lot of memory required for an EoR switch.

   Therefore, we are proposing further optimization to achieve:

       Enabling all RBridge edge ports to forward traffic across
        RBridge domain by Directory assisting ToR switches TRILL
        encapsulating the traffic and sending it to an RBridge port.
        This mechanism avoids the flooding and potential loops.

       Small number of nodes in RBridge IS/IS domain;

       Not too large table of MAC&VLAN <-> RBridge Egress mapping for
        nodes which encapsulate the TRILL header.

   Here is the basic framework to achieve the optimization:

       RBridge domain is bounded by EoR switches, i.e. only EoR
        switches and core switches participate in RBridge IS/IS
        routing. So the number of entries in RBridge IS/IS Forwarding
        Table is relatively small.

       However, for native Ethernet frames to traverse the RBridge
        domain, the TRILL encapsulation is done one hop (or more)
        before entering the RBridge domain (e.g. by ToR switches or
        virtual switch), instead of the RBridge Ingress edge node (e.g.
        EoR switches). That means that the edge ports of the RBridge
        Ingress node could receive both TRILL-encapsulated data frames
        and native Ethernet frames.   [RBridge] Section 4.6.2 Bullet 8
        specifies that an RBridge port can be configured to accept both
        TRILL encapsulated frames from a neighbor that is not an
        RBridge.



Dunbar                  Expires Sept4, 2012                  [Page 8]

Internet-Draft     Directory Assisted RBridge edge          March 2011


        When data frames do not need to traverse RBridge domain,
        RBridge ingress edge node does its normal native Ethernet data
        frame processing.





       For egress direction on RBridge edge node, the processing is
        exactly same as regular RBridge edge node, i.e. decapsulates
        the TRILL header of the received TRILL frames and forward the
        decapsulated Ethernet frames to hosts attached to its edge
        ports.

       We call a switch which only performs the necessary TRILL
        encapsulation for Ethernet data frames to traverse the RBridge
        domain a ''TRILL Encapsulating node''.

        The TRILL Encapsulating Node gets the MAC&VLAN<->RBridge Edge
        mapping table pushed down or pulled from directory servers.
        Upon receiving a native Ethernet frame, the TRILL Encapsulating
        node checks the MAC&VLAN<->RBridge Edge mapping table, and
        perform the corresponding TRILL encapsulation if the entry is
        found in the mapping table. If the destination address of the
        received Ethernet frame and its VLAN doesn't exist in the
        mapping table, the Ethernet frame is forwarded based on normal
        Ethernet switching function.

       +---------------+
       |Outer Ether hd |
       |---------------|
       |TRILL Header   |
       |---------------|      ^
       | MAC-400       |      |
       |---------------|   Inner Ether Header
       | MAC-1         |      |
       |---------------|      V
       |               |
       |---------------|
       |  Payload      |
       |---------------|
       | Ethernet FCS  |
       +---------------+
               ^
               |      +-------+  TRILL    +------+
               |      |  R1   |-----------|  R2  |  Decapsulate TRILL
               |      +---+---+  domain   +------+  header


Dunbar                  Expires Sept4, 2012                  [Page 9]

Internet-Draft     Directory Assisted RBridge edge          March 2011


               |          |                   |
               +----------|                   |
                          |                   |
                       +-----+             +-----+
      Non-RBridge node:|T12  |             | T22 |
      Encapsulate TRILL+-----+             +-----+
      Header for data
      Frames to traverse
      TRILL domain.



 5.1. TRILL Header encapsulated by non-RBridge nodes RBridge

   TRILL header includes Source RBridge's nickname and Destination
   RBridge's nickname. When a TRILL header is added by a non-RBridge
   node, using the Ingress RBridge edge node's nickname in the source
   address field will make the ingress RBridge node receive TRILL
   frames with its own nickname in the frames' source address field
   which can be confusing.

   To avoid confusion of Edge RBridges receiving TRILL encapsulated
   frames with its own nickname in the frames' source address field
   from neighboring non-RBridge nodes, a new nickname is given to an
   RBridge edge node, which can be called Phantom Nickname, to
   represent all the TRILL encapsulating nodes attached to the edge
   ports of the RBridge edge node.

   When the Phantom Nickname is used in the Source Address field of a
   TRILL frame, it is understood that the TRILL encapsulation is
   actually done by a non-RBridge node which is attached to an edge
   port of an RBridge Ingress node.

6. Conclusion and Recommendation

    The traditional RBridge learning approach of observing data plane
    can no longer keep pace with the ever growing number of hosts in
    Data center.

    Therefore, we suggest TRILL to consider directory assisted
    approach(es). This draft only introduces the basic concept of using
    directory assisted approach for RBridge edge nodes to learn the
    MAC&VLAN to RBridge Edge mapping. We want to get some working group
    consensus before drilling down to detailed steps required for the
    approach.




Dunbar                  Expires Sept4, 2012                 [Page 10]

Internet-Draft     Directory Assisted RBridge edge          March 2011


7. Manageability Considerations

   This document does not add additional manageability considerations.

8. Security Considerations

   TBD.

9. IANA Considerations

   TBD

10. Acknowledgments

   This document was prepared using 2-Word-v2.0.template.dot.

11. References

   [RBridges] Perlman, et, al ''RBridge: Base Protocol Specification'',
   <draft-ietf-trill-rbridge-protocol-16.txt>, March, 2010


   [RBridges-AF]   Perlman, et, al ''RBridges: Appointed Forwarders'',
   <draft-ietf-trill-rbridge-af-02.txt>, April 2011



   [ARMD-Problem] Dunbar, et,al, ''Address Resolution for Large Data
             Center Problem Statement'', Oct 2010.

   [ARP reduction] Shah, et. al., "ARP Broadcast Reduction for Large Data
             Centers", Oct 2010
















Dunbar                  Expires Sept4, 2012                 [Page 11]

Internet-Draft     Directory Assisted RBridge edge          March 2011


Authors' Addresses

   Linda Dunbar
   Huawei Technologies
   1700 Alma Drive, Suite 500
   Plano, TX 75075, USA
   Phone: (972) 543 5849
   Email: ldunbar@huawei.com


   Donald Eastlake
   Huawei Technologies
   155 Beaver Street
   Milford, MA 01757 USA
   Phone: 1-508-333-2270
   Email: d3e3e3@gmail.com

   Radia Perlman
   Intel Labs
   2200 Mission College Blvd.
   Santa Clara, CA 95054-1549 USA
   Phone: +1-408-765-8080
   Email: Radia@alum.mit.edu


   Igor Gashinsky
   Yahoo
   45 West 18th Street 6th floor
   New York, NY 10011
   Email: igor@yahoo-inc.com


Intellectual Property Statement

   The IETF Trust takes no position regarding the validity or scope of
   any Intellectual Property Rights or other rights that might be
   claimed to pertain to the implementation or use of the technology
   described in any IETF Document or the extent to which any license
   under such rights might or might not be available; nor does it
   represent that it has made any independent effort to identify any
   such rights.

   Copies of Intellectual Property disclosures made to the IETF
   Secretariat and any assurances of licenses to be made available, or
   the result of an attempt made to obtain a general license or
   permission for the use of such proprietary rights by implementers or



Dunbar                  Expires Sept4, 2012                 [Page 12]

Internet-Draft     Directory Assisted RBridge edge          March 2011


   users of this specification can be obtained from the IETF on-line
   IPR repository at http://www.ietf.org/ipr

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   any standard or specification contained in an IETF Document. Please
   address the information to the IETF at ietf-ipr@ietf.org.

Disclaimer of Validity

   All IETF Documents and the information contained therein are
   provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION
   HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY,
   THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
   WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
   WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
   ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
   FOR A PARTICULAR PURPOSE.

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.
























Dunbar                  Expires Sept4, 2012                 [Page 13]


PAFTECH AB 2003-20262026-04-24 05:19:44