One document matched: draft-dunbar-trill-directory-assisted-edge-01.txt
Differences from draft-dunbar-trill-directory-assisted-edge-00.txt
TRILL working group L. Dunbar
Internet Draft D. Eastlake
Intended status: Standard Track Huawei
Expires: Sept 2012 Radia Perlman
Intel
I. Gashinsky
Yahoo
July 11, 2011
Directory Assisted RBridge edge
draft-dunbar-trill-directory-assisted-edge-01.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on January 11, 2009.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Dunbar Expires January 11, 2012 [Page 1]
Internet-Draft Directory Assisted RBridge edge March 2011
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the BSD License.
Abstract
RBridge edge nodes currently learn the mapping between MAC address
and its corresponding RBridge edge node address by observing the
data packets traversed through.
This document describes why and how directory assisted RBridge edge
nodes can improve TRILL network scalability in data center
environment.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 0.
Table of Contents
1. Introduction ................................................ 3
2. Terminology ................................................. 3
3. Impact to RBridge Network by massive number of hosts in Data
Center(s) ...................................................... 3
4. Directory Assisted RBridge Edge in Data Center................5
5. Further optimization in using directory assistance........... 7
5.1. TRILL Header encapsulated by non-RBridge nodes.......... 9
6. Conclusion and Recommendation............................... 10
7. Manageability Considerations................................ 10
8. Security Considerations..................................... 10
9. IANA Considerations ........................................ 10
10. Acknowledgments ........................................... 11
11. References ................................................ 11
Authors' Addresses ............................................ 11
Intellectual Property Statement................................ 12
Disclaimer of Validity ........................................ 13
Dunbar Expires Sept11, 2012 [Page 2]
Internet-Draft Directory Assisted RBridge edge March 2011
1. Introduction
Data center networks are different from campus networks in several
ways. Main differences include:
o Data centers, especially Internet or cloud data centers with
virtualized servers, tend to have large number of hosts
o Topology is based on racks, rows.
- Hosts assignment to Servers, Racks, and Rows is
orchestrated by Server/VM Management system, not random.
o With virtualization, there is an ever increasing trend to
dynamically create VMs when the application requires more
resources, and move VMs, either from overloaded servers, or to
aggregate VMs onto fewer servers to save power when demand is
light. This may lead to hosts belonging to same subnet being
placed under different locations (racks or rows).
This draft describes why and how Data Center TRILL networks can be
optimized by utilizing directory assisted approach.
2. Terminology
AF Appointed Forwarder RBridge port
Bridge: IEEE802.1Q compliant device. In this draft, Bridge is used
interchangeably with Layer 2 switch.
DC: Data Center
EoR: End of Row switches in data center. Also known as
Aggregation switches in some data centers
FDB: Filtering Database for Bridge or Layer 2 switch
ToR: Top of Rack Switch in data center. It is also known as
access switches in some data centers.
VM: Virtual Machines
3. Impact to RBridge Network by massive number of hosts in Data
Center(s)
In a data center, there are likely to be very large number of hosts
(e.g. hundreds of thousands, or even more) and hosts belonging to
one subnet may be placed under different racks or rows. If TRILL is
deployed in those data centers, hosts belonging to one subnet may be
Dunbar Expires Sept11, 2012 [Page 3]
Internet-Draft Directory Assisted RBridge edge March 2011
placed under multiple edge RBridges which are on different Bridged
LAN. And each edge RBridge needs to enable multiple VLANs. This
creates several problems which this draft is intending to address:
- Unnecessary filling of slots in MAC table of edge RBridges, due
to an edge RBridge, R1 receiving broadcast traffic (ARP/ND
queries or gratuitous APRs) from hosts that are not actually
communicating with any hosts attached to R1.
- The current TRILL protocol requires a MAC address to only be
accessible from one RBridge edge port (AF port). When a data
center has dual uplinks for each rack of servers to two
different Access switches (which is very common), some links
can't be fully utilized.
- Flooding within RBridge domain triggered by ARP/ND.
Consider a data center with 80 rows, 8 racks per row and 40 servers
per rack. There can be 80*8*40=25600 servers. Suppose each server
is virtualized to 20 VMs, there could be 25600*20=512000 hosts in
this data center. A common network design for this kind of data
centers is to have multiple tiers of switches, e.g. one or two
Access Switches for each rack (ToR), Aggregation switches for each
row (or EoR), and some Core switches to interconnect the Aggregation
switches.
If TRILL is to be deployed in this data center, let's consider
following two scenarios of TRILL domain boundary:
- Scenario #1: TRILL domain boundary are Access (TOR) switches,
i.e. ToR being the edge RBridges:
With 80 rows and 8 racks per row, there will be 80*8 = 640 edge
RBridges, with each Edge RBridge supporting 40 RBridge edge
ports (facing the servers) and 8 RBridge trunk ports facing
aggregation (EoR) switches. Then there are 40*640 = 25600
RBridge edge ports in this data center.
If each rack and row has two redundant switches, then there
will be 640*2=1280 RBridge edge nodes and 80*2=160 RBridge core
nodes. Total number of nodes in this RBridge domain could be
1440 (1280+160) plus some core switches which interconnect all
the aggregation (EoR) switches very large number of nodes in
this RBridge IS/IS domain.
Dunbar Expires Sept11, 2012 [Page 4]
Internet-Draft Directory Assisted RBridge edge March 2011
- Scenario #2: TRILL domain boundary are the aggregation (EoR)
switches:
With the same assumption as before, there will be 80 Edge
RBridges in the RBridge IS/IS domain. Even with redundancy, the
number of nodes in RBridge domain will be less than 200.
Therefore, the size of the RBridge IS/IS domain is reasonable.
But, this scenario creates a Bridged LAN attached to RBridge
edge ports. It becomes necessary to designate only one port (AF
port) to forward native traffic to avoid loops among multiple
RBridge edge ports and to have some mechanisms to prevent loops
within the Bridged LAN attached to RBridge edge ports.
Designating one AF port for forwarding native traffic not only
makes some links unusable but also put extra heavy load on the
AF port. In addition, when AF changes, traffic temporarily goes
to black hole. Running traditional Layer 2 STP/RSTP on the
Bridged LAN for loop prevention may be overkill because the
topology among the ToR switches and RBridge edge is very
simple. This draft proposes a simple directory assisted
approach to avoid loops.
In addition, the number of MAC&VLAN<->RBridge Edge Mappings to
be learned and managed by RBridge edge node can be very large.
In the example above, each Edge RBridge has 8 RBridge edge
ports facing the ToR switches. Since each ToR has 40 downstream
ports facing servers and each server has 20 VMs, there are
40*20 = 800 hosts attached to each downstream port of an EoR
switch and total of 8*800=6400 hosts attached to this EoR
switch. If all those 6400 hosts belong to 640 VLANs and each
VLAN has 200 hosts, then, under the worst case scenario, the
total number of MAC&VLAN entries to be learned by the RBridge
edge (i.e. EoR) can be 640*200=128000. You can easily see that
the number of MAC&VLAN<->RBridge Edge mapping entries to be
learnt by the RBridge edge node can be very large.
4. Directory Assisted RBridge Edge in Data Center environment
In data center environment, the hosts (VMs) placement to servers,
racks, and rows is orchestrated by Server (or VM) Management
System(s), i.e. there is a database or multiple ones (distributed
model) which have the knowledge of where each host (VM) is placed.
If RBridge edge nodes can utilize the information of where each host
is located, then the flooding process to learn the mapping between
MAC&VLAN and corresponding RBridge Edge node can be eliminated. This
is a great optimization, especially in virtualized data center
Dunbar Expires Sept11, 2012 [Page 5]
Internet-Draft Directory Assisted RBridge edge March 2011
environment where VMs migrate all the time. If migrated VMs send out
gratuitous ARP (IPv4) or Unsolicited Neighbor Advertisement (IPv6)
from the new location, those gratuitous broadcast messages have to
flood to all other RBridge edge nodes. If migrated VMs don't send
out gratuitous ARP (or ND) from the new location, for packets
towards those migrated VMs the ingress RBridge edge nodes will send
them to the wrong egress RBridge edge nodes, which is also waste of
bandwidth.
The benefits of using directory assistance include:
- The Directory enforced MAC&VLAN <-> RBridge Edge mapping table
can determine if a frame needs to be forwarded across RBridge
domain.
o When multiple Rbridge edge ports are accessible from a
server (hosts/VMs), a directory assisted RBridge edge
won't flood frames with an unknown DA to all to other
RBridge ports. Therefore, there is no need to designate an
Appointed Forwarder among all the RBridge Edge ports
connected to a Bridge LAN, which enables all RBridge
ingress ports to forward traffic.
- Directory assisted approach can not only eliminate the flooding
within RBridge domain (unknown learning), but also reduce the
flooding on the bridged LAN attached to RBridge edge ports.
- Reduce the amount of MAC&VLAN <-> RBridge edge mapping
maintained by RBridge edge. No need for an RBridge edge to keep
the MAC entries for hosts which don't communicate with hosts
attached to an RBridge edge.
There can be two different models for RBridge edge node to be
assisted by Directory:
- Push Model:
Directory Server(s) push down the MAC&VLAN <-> RBridge Edge
mapping for all the hosts which might communicate with hosts
attached to RBridge edge node.
[Editor's note: there are multiple ways to narrow down the
smallest set of remote hosts which communicate with hosts
attached to an RBridge edge. A very simple approach: For VLAN
#i enabled on one of RBridge Edge port(s), MAC entries for
hosts in VLAN #i will not be pushed down to RBridge Edge if
Dunbar Expires Sept11, 2012 [Page 6]
Internet-Draft Directory Assisted RBridge edge March 2011
there is no hosts belonging to VLAN #i attached to the RBridge
edge. Detailed approaches will be described in a separate
draft.]
Whenever there is any change in MAC&VLAN <-> RBridge Edge
mapping, which can be triggered by hosts being moved, de-
commissioned, or temporarily out of service due to
maintenance, an incremental update can be sent to the RBridge
edge nodes which are impacted by the change.
Under this model, it is recommended for RBridge edge node to
simply drop the data frame (instead of flooding to RBridge
domain) if the destination address can't be found in the
MAC&VLAN<->RBridge Edge mapping table.
- Pull model:
Under this model, RBridge edge node can simply intercept all
ARP requests and forward them to the Directory Server(s) which
has the information of how each MAC&VLAN is mapped to its
corresponding RBridge edge node.
The reply from the Directory Server can be the standard ARP
reply with an extra field showing the RBridge egress node
address
RBridge ingress node can cache the mapping
If RBridge edge node receives an unknown MAC-DA, it could
choose drop the data frame as in the Push Model, or it can
query the directory server. If there is no response from the
directory server, the RBridge edge node can drop the frame.
5. Further optimization in using directory assistance for RBridge in
data center
The topology between aggregation (or EoR) switches and access (or
ToR) switches can be very simple in data center environment. Under
those simple topology environments, having the ToR switches
participating in RBridge's IS/IS routing domain does not provide
much value in topology discovery. By eliminating ToR switches from
RBridge routing domain (i.e. let the aggregation (EoR) switches be
the boundary of RBridge domain), the number of nodes in the RBridge
routing domain can be greatly reduced, which in turns can make the
network scale better.
Dunbar Expires Sept11, 2012 [Page 7]
Internet-Draft Directory Assisted RBridge edge March 2011
However, two new problems are introduced by letting the aggregation
(EoR) switches be the RBridge Domain boundary in the data center
environment:
- the number of MAC&VLAN<->RBridge Edge mapping entries to be
maintained by the RBridge edge node can be very large (See
Scenario #2 in Section 3)
- there is a bridged LAN with multiple ToR switches attached to
RBridge Edge port(s), it becomes necessary to have some
mechanisms to prevent loops.
The Directory Assistance introduced in this draft (Section 4)
provides a solution to the second problem above, i.e. avoid loops
among RBridge Edge ports connected to one Bridged LAN so that all
RBridge edge ports connected by a Bridged LAN can forward native
traffic. However, there is still the problem of too large table of
MAC&VLAN <-> RBridge Egress mapping on the edge RBridges.
Therefore, we are proposing further optimization:
- for native Ethernet frames to traverse the RBridge domain, the
TRILL encapsulation is done on a node before entering the
RBridge domain (e.g. by ToR switches or virtual switch on
server), instead of the RBridge Ingress edge node (e.g. EoR
switches). That means that the edge ports of the RBridge
Ingress node could receive both TRILL-encapsulated data frames
and native Ethernet frames. [RBridge] Section 4.6.2 Bullet 8
specifies that an RBridge port can be configured to accept both
TRILL encapsulated frames from a neighbor that is not an
RBridge.
When data frames do not need to traverse RBridge domain,
RBridge ingress edge node does its normal native Ethernet data
frame processing.
- For egress direction on RBridge edge node, the processing is
exactly same as regular RBridge edge node, i.e. decapsulates
the TRILL header of the received TRILL frames and forward the
decapsulated Ethernet frames to hosts attached to its edge
ports.
We call a switch which only performs the necessary TRILL
encapsulation for Ethernet data frames to traverse the RBridge
domain a ''TRILL Encapsulating node'' or ''Simplified RBridge''.
Dunbar Expires Sept11, 2012 [Page 8]
Internet-Draft Directory Assisted RBridge edge March 2011
The TRILL Encapsulating Node gets the MAC&VLAN<->RBridge Edge
mapping table pushed down or pulled from directory servers.
Upon receiving a native Ethernet frame, the TRILL Encapsulating
node checks the MAC&VLAN<->RBridge Edge mapping table, and
perform the corresponding TRILL encapsulation if the entry is
found in the mapping table. If the destination address of the
received Ethernet frame and its VLAN doesn't exist in the
mapping table, the Ethernet frame is forwarded based on normal
Ethernet switching function.
+---------------+
|Outer Ether hd |
|---------------|
|TRILL Header |
|---------------| ^
| MAC-400 | |
|---------------| Inner Ether Header
| MAC-1 | |
|---------------| V
| |
|---------------|
| Payload |
|---------------|
| Ethernet FCS |
+---------------+
^
| +-------+ TRILL +------+
| | R1 |-----------| R2 | Decapsulate TRILL
| +---+---+ domain +------+ header
| | |
+----------| |
| |
+-----+ +-----+
Non-RBridge node:|T12 | | T22 |
Encapsulate TRILL+-----+ +-----+
Header for data
Frames to traverse
TRILL domain.
5.1. TRILL Header encapsulated by non-RBridge nodes
TRILL header includes Source RBridge's nickname and Destination
RBridge's nickname. When a TRILL header is added by a non-RBridge
node, using the Ingress RBridge edge node's nickname in the source
address field will make the ingress RBridge node receive TRILL
Dunbar Expires Sept11, 2012 [Page 9]
Internet-Draft Directory Assisted RBridge edge March 2011
frames with its own nickname in the frames' source address field
which can be confusing.
To avoid confusion of Edge RBridges receiving TRILL encapsulated
frames with its own nickname in the frames' source address field
from neighboring non-RBridge nodes, a new nickname is given to an
RBridge edge node, which can be called Phantom Nickname, to
represent all the TRILL encapsulating nodes attached to the edge
ports of the RBridge edge node.
When the Phantom Nickname is used in the Source Address field of a
TRILL frame, it is understood that the TRILL encapsulation is
actually done by a non-RBridge node which is attached to an edge
port of an RBridge Ingress node.
[Editor's note: a separate draft will be submitted to describe the
comprehensive behavior of the ''simplified RBridge'' node]
6. Conclusion and Recommendation
The traditional RBridge learning approach of observing data plane
can no longer keep pace with the ever growing number of hosts in
Data center.
Therefore, we suggest TRILL to consider directory assisted
approach(es). This draft only introduces the basic concept of using
directory assisted approach for RBridge edge nodes to learn the
MAC&VLAN to RBridge Edge mapping. We want to get some working group
consensus before drilling down to detailed steps required for the
approach.
7. Manageability Considerations
This document does not add additional manageability considerations.
8. Security Considerations
TBD.
9. IANA Considerations
TBD
Dunbar Expires Sept11, 2012 [Page 10]
Internet-Draft Directory Assisted RBridge edge March 2011
10. Acknowledgments
This document was prepared using 2-Word-v2.0.template.dot.
11. References
[RBridges] Perlman, et, al ''RBridge: Base Protocol Specification'',
<draft-ietf-trill-rbridge-protocol-16.txt>, March, 2010
[RBridges-AF] Perlman, et, al ''RBridges: Appointed Forwarders'',
<draft-ietf-trill-rbridge-af-02.txt>, April 2011
[ARMD-Problem] Dunbar, et,al, ''Address Resolution for Large Data
Center Problem Statement'', Oct 2010.
[ARP reduction] Shah, et. al., "ARP Broadcast Reduction for Large Data
Centers", Oct 2010
Authors' Addresses
Linda Dunbar
Huawei Technologies
1700 Alma Drive, Suite 500
Plano, TX 75075, USA
Phone: (972) 543 5849
Email: ldunbar@huawei.com
Dunbar Expires Sept11, 2012 [Page 11]
Internet-Draft Directory Assisted RBridge edge March 2011
Donald Eastlake
Huawei Technologies
155 Beaver Street
Milford, MA 01757 USA
Phone: 1-508-333-2270
Email: d3e3e3@gmail.com
Radia Perlman
Intel Labs
2200 Mission College Blvd.
Santa Clara, CA 95054-1549 USA
Phone: +1-408-765-8080
Email: Radia@alum.mit.edu
Igor Gashinsky
Yahoo
45 West 18th Street 6th floor
New York, NY 10011
Email: igor@yahoo-inc.com
Intellectual Property Statement
The IETF Trust takes no position regarding the validity or scope of
any Intellectual Property Rights or other rights that might be
claimed to pertain to the implementation or use of the technology
described in any IETF Document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.
Copies of Intellectual Property disclosures made to the IETF
Secretariat and any assurances of licenses to be made available, or
the result of an attempt made to obtain a general license or
permission for the use of such proprietary rights by implementers or
users of this specification can be obtained from the IETF on-line
IPR repository at http://www.ietf.org/ipr
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
any standard or specification contained in an IETF Document. Please
address the information to the IETF at ietf-ipr@ietf.org.
Dunbar Expires Sept11, 2012 [Page 12]
Internet-Draft Directory Assisted RBridge edge March 2011
Disclaimer of Validity
All IETF Documents and the information contained therein are
provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION
HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY,
THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Dunbar Expires Sept11, 2012 [Page 13]
| PAFTECH AB 2003-2026 | 2026-04-24 05:18:41 |