One document matched: draft-dunbar-trill-directory-assisted-edge-02.txt
Differences from draft-dunbar-trill-directory-assisted-edge-01.txt
TRILL working group L. Dunbar
Internet Draft D. Eastlake
Intended status: Standard Track Huawei
Expires: Sept 2012 Radia Perlman
Intel
I. Gashinsky
Yahoo
October 24, 2011
Directory Assisted RBridge edge
draft-dunbar-trill-directory-assisted-edge-02.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on April 24, 2009.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Dunbar Expires April 24, 2012 [Page 1]
Internet-Draft Directory Assisted RBridge edge March 2011
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the BSD License.
Abstract
RBridge edge nodes currently learn the mapping between MAC addresses
and their corresponding RBridge edge nodes by observing the data
packets traversed through. When ingress RBridge receives a data
packet with its destination address (MAC&VLAN) unknown, the data
packet is flooded across RBridge domain. When there are more than
one RBridge ports connected to one bridged LAN, only one of them can
be designated as AF port for forwarding/receiving traffic for the
LAN, the rest have to be blocked.
This draft describes why and how directory assisted RBridge edge can
improve TRILL network scalability in data center environment.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 0.
Table of Contents
1. Introduction ................................................ 3
2. Terminology ................................................. 3
3. Impact to RBridge domain by massive number of hosts in Data
Center ......................................................... 4
4. Directory Assisted RBridge Edge in Data Center environment .... 6
4.1. Push Model ............................................. 7
4.2. Pull model: ............................................ 9
5. Conclusion and Recommendation .............................. 10
6. Manageability Considerations ............................... 11
7. Security Considerations..................................... 11
8. IANA Considerations ........................................ 11
9. Acknowledgments ............................................ 11
10. References ................................................ 11
Authors' Addresses ............................................ 12
Intellectual Property Statement................................ 12
Disclaimer of Validity ........................................ 13
Dunbar Expires Sept24, 2012 [Page 2]
Internet-Draft Directory Assisted RBridge edge March 2011
1. Introduction
Data center networks are different from campus networks in several
ways, in particular:
1. Data centers, especially Internet or multi-tenant data centers,
tend to have large number of hosts with variety of applications.
2. Topology is based on racks and rows.
Hosts assignment to Servers, Racks, and Rows is orchestrated
by Server/VM Management system, not random.
3. Rapid workload shifting in data centers can accelerate the
frequency of one physical server being re-loaded with different
applications. Sometimes, applications re-loaded to one physical
server at different time can belong to different subnets.
4. With virtualization, there is an ever increasing trend to
dynamically create or delete VMs when demand for resource changes,
to move VMs from overloaded servers, or to aggregate VMs onto
fewer servers when demand is light.
Both 3) and 4) above can lead to hosts in one subnet being placed
under different locations (racks or rows) or one rack having hosts
belonging to different subnets.
This draft describes why and how Data Center TRILL networks can be
optimized by utilizing directory assisted approach.
2. Terminology
AF Appointed Forwarder RBridge port
Bridge: IEEE 802.1Q compliant device. In this draft, Bridge is used
interchangeably with Layer 2 switch.
DA: Destination Address
DC: Data Center
EoR: End of Row switches in data center. Also known as
Aggregation switches in some data centers
FDB: Filtering Database for Bridge or Layer 2 switch
Host: Application running on a physical server or a virtual
machine. A host usually has at least one IP address and at
least one MAC address.
Dunbar Expires Sept24, 2012 [Page 3]
Internet-Draft Directory Assisted RBridge edge March 2011
SA: Source Address
ToR: Top of Rack Switch in data center. It is also known as
access switches in some data centers.
VM: Virtual Machines
3. Impact to RBridge domain by massive number of hosts in Data Center
It is common for Data Center networks to have multiple tiers of
switches, e.g. one or two Access Switches for each server rack
(ToR), aggregation switches for some rows (or EoR switches), and
some core switches to interconnect the aggregation switches. Many
aggregation switches deployed in data centers are high port density
switches. It is not uncommon to see aggregation switches
interconnecting hundreds of ToR switches.
+-------+ +------+
+/------+ | +/-----+ |
| Aggr11| + ----- |AggrN1| + EoR Switches
+---+---+/ +------+/
/ \ / \
/ \ / \
+---+ +---+ +---+ +---+
|T11|... |T1x| |T21| .. |T2y| ToR switches
+---+ +---+ +---+ +---+
| | | |
+-|-+ +-|-+ +-|-+ +-|-+
| |... | | | | .. | |
+---+ +---+ +---+ +---+ Server racks
| |... | | | | | |
+---+ +---+ +---+ +---+
| |... | | | | | |
+---+ +---+ +---+ +---+
Figure 1: Typical Data Center Network Design
When TRILL is deployed in a data center with large number of hosts,
with the possibility of hosts in one subnet being placed under
multiple edge RBridges and each edge RBridge having hosts from
different subnets, the following problems will occur:
- Unnecessary filling of slots in MAC table of edge RBridges, due
to edge RBridge receiving broadcast traffic (ARP/ND
broadcast/multicast) from hosts under other edge RBridges that
are not actually communicating with any hosts attached.
Dunbar Expires Sept24, 2012 [Page 4]
Internet-Draft Directory Assisted RBridge edge March 2011
- Some edge RBridge ports being blocked when there are more than
one RBridge ports connected to one bridged LAN. When there are
multiple RBridge ports connected to a bridged LAN, only one,
i.e. the AF port, can forward/receive traffic for the LAN, the
rest have to be blocked. When a rack has dual uplinks to two
different ToR switches, i.e. RBridge Edges, (which is very
common), some links can't be fully utilized.
- Packets being flooded across RBridge domain when their DAs are
not in ingress RBridge's cache.
Consider a data center with 1600 server racks. Each server rack has
at least one ToR switch. The ToR switches are further divided to 8
groups, with each group being connected by a group of aggregation
switches. There could be 4 to 8 aggregation switches in each group
to achieve load sharing for traffic to/from server racks. If TRILL
is to be deployed in this data center environment, let's consider
following two scenarios of TRILL domain boundary:
- Scenario #1: TRILL domain boundary starts at ToR switches:
If each server rack has one uplink to one ToR, there are 1600
edge RBridges. If each rack has dual uplinks to two ToR
switches, then there will be 3200 edge RBridges
In this scenario, the RBridge domain will have more than 1600
(or 3200) + 8*4 (or 8*8) nodes, which is quite large IS/IS
domain. Even though a mesh IS/IS domain can scale up to
thousands of nodes, it is very challenging for aggregation
switches to handle IS/IS link state advertisement among
hundreds of ports.
- Scenario #2: TRILL domain boundary starts at the aggregation
switches:
With the same assumption as before, the number of nodes in
RBridge domain will be less than 100, and aggregation switches
don't have to handle IS/IS link state advisements among
hundreds of ports.
But in this scenario, there will be multiple RBridge edge ports
connected to one bridged LAN, which requires only one of them
being designated as Appointed Forwarder (AF port) for
forwarding native traffic across RBridge domain, while other
ports/links being blocked. There is also possibility of loops
Dunbar Expires Sept24, 2012 [Page 5]
Internet-Draft Directory Assisted RBridge edge March 2011
on the bridged LAN attached to RBridge edge ports. Running
traditional Layer 2 STP/RSTP on the bridged LAN in this
environment may be overkill because the topology among the ToR
switches and aggregation switches is very simple.
In addition, the number of MAC&VLAN<->RBridgeEdge Mapping
entries to be learned and managed by RBridge edge node can be
very large. In the example above, each edge RBridge has 200
edge ports facing the ToR switches. If each ToR has 40
downstream ports facing servers and each server has 10 VMs,
there could be 200*40*10 = 80000 hosts attached. If all those
hosts belong to 1600 VLANs (i.e. 50 per VLAN) and each VLAN has
200 hosts, then under the worst case scenario, the total number
of MAC&VLAN entries to be learned by the RBridge edge can be
1600*200=320000, which is very large.
4. Directory Assisted RBridge Edge in Data Center environment
In data center environment, applications placement to servers,
racks, and rows is orchestrated by Server (or VM) Management
System(s). I.e. there is a database or multiple ones (distributed
model) which have the knowledge of where each host is located. If
those host location information can be fed into RBridge edge nodes,
in some forms of Directory Service, then RBridge edge nodes won't
need to flood data frames across RBridge domain.
Avoiding unknown DA flooding to RBridge domain is especially
valuable in data center environment because there is higher chance
of RBridge edge receiving packets with unknown DA and
broadcast/multicast messages due to VM migration and servers being
loaded with different applications. When a VM is moved to a new
location or a server is loaded with a new application with different
IP/MAC addresses, it is more likely that the DA of data packets sent
out from those hosts are unknown to their attached RBridge edges.
In addition, gratuitous ARP (IPv4) or Unsolicited Neighbor
Advertisement (IPv6) sent out from those newly migrated or activated
hosts have to be flooded to other RBridge edges which have hosts in
the same subnets.
The benefits of using directory assistance include:
- Avoid flooding unknown DA across RBridge domain. The Directory
enforced MAC&VLAN <-> RBridgeEdge mapping table can determine
if a data packet needs to be forwarded across RBridge domain.
Dunbar Expires Sept24, 2012 [Page 6]
Internet-Draft Directory Assisted RBridge edge March 2011
When multiple RBridge edge ports are connected via bridged LAN
to hosts (servers/VMs), a directory assisted RBridge edge can
simply drop frames with an unknown DA. It won't need to flood
data frames to all other RBridge ports. Therefore, there is no
need to designate one Appointed Forwarder among all the RBridge
Edge ports connected to a bridge LAN, which means that all
RBridge ports can forward/receive traffic.
- Reduce flooding decapsulated Ethernet frames with unknown MAC-
DA to a bridged LAN connected to RBridge edge ports.
When an RBridge receives a TRILL frame whose destination
Nickname matches with its own, the normal procedure is for the
RBridge to decapsulate the TRILL header and forward the
decapsulated Ethernet frame to its directly attached bridged
LAN. If the destination MAC is unknown, the decapsulated
Ethernet frame is flooded in the LAN. With directory
assistance, the RBridge edge can determine if DA in a frame
matches with any hosts attached via the bridged LAN. Therefore,
frames can be discarded if their DAs do not match.
- Reduce the amount of MAC&VLAN <-> RBridgeEdge mapping
maintained by RBridge edge. No need for an RBridge edge to keep
the MAC entries for hosts which don't communicate with hosts
attached to an RBridge edge.
There can be two different models for RBridge edge node to be
assisted by Directory Service: Push Model and Pull Model.
4.1. Push Model
Under this model, Directory Server(s) push down the MAC&VLAN <->
RBridgeEdge mapping for all the hosts which might communicate with
hosts attached to an RBridge edge node. The mapping entry to be
pushed down could leverage the gratuitous ARP reply with extended
fields showing the edge RBridge's name, as shown in Table 2. Using
Table 2 requires one entry per host. When directory pushes down the
entire mapping to an edge RBridge for the very first time, there
usually are many entries. To minimize the number of entries pushed
down, summarization should be considered, e.g. with one edge RBridge
Nickname being associated with all attached hosts' MAC addresses and
VLANs as shown below:
Dunbar Expires Sept24, 2012 [Page 7]
Internet-Draft Directory Assisted RBridge edge March 2011
+------------+-------+--------------------------------+
| Nickname1 |VID-1 | MAC1, MAC2, ..MACn |
| |------ +--------------------------------+
| |VID-2 | MAC1, MAC2, ..MACn |
| |------ +--------------------------------+
| |...... | MAC1, MAC2, ..MACn |
+-------------+------- +------------------------------------+
| Nickname2 |VID-1 | MAC1, MAC2, ...MACn |
| |------- +------------------------------------+
| |VID-2 | MAC1, MAC2, ...MACn |
| |------- +------------------------------------+
| |.... . | MAC1, MAC2, ... MACn |
+-------------+------- +------------------------------------+
| ------- |------ +------------------------------------+
| |... . | MAC1, MAC2, .MACn |
+-------------+------- +------------------------------------+
Table 1: Summarized table pushed down from directory
Whenever there is any change in MAC&VLAN <-> RBridgeEdge mapping,
which can be triggered by hosts being added, moved, or de-
commissioned, an incremental update can be sent to the RBridge edges
which are impacted by the change.
Under this model, it is recommended for RBridge edge to simply drop
a data packet (instead of flooding to RBridge domain) if the
packet's destination address can't be found in the MAC&VLAN<-
>RBridgeEdge mapping table.
It may not be necessary for every RBridge edge to get the entire
mapping table for all the hosts in a data center. There are many
ways to narrow down the smaller set of remote hosts which
communicate with hosts attached to an RBridge edge. A simple
approach of only pushing down the mapping for the VLANs which have
active hosts under an RBridge edge can reduce the number of mapping
entries pushed down.
However, it is inevitable that RBridge edge's MAC&VLAN<->RBridgeEdge
mapping table will have more entries than they really need under the
Push Model. When hosts attached to one RBridge Edge rarely
communicate with hosts attached to different RBridge edges even
though they are on the same VLAN, the normal process of RBridge
edge's unknown DA flooding, learning and cache aging would have
removed those MAC&VLAN entries from the RBridge's cache. But it can
be difficult for Directory Servers to predict the communication
patterns among hosts within one VLAN. Therefore, more likely the
Directory Servers will push down all the MAC&VLAN entries if there
are hosts in the VLAN being attached to the RBridge Edge.
Dunbar Expires Sept24, 2012 [Page 8]
Internet-Draft Directory Assisted RBridge edge March 2011
4.2. Pull model:
Under this model, RBridge edge node can simply intercept all ARP/ND
requests and frames with unknown DA, and forward them to the
Directory Server(s) which has the information on where each host is
located.
The reply from the Directory Server can be the standard ARP/ND reply
with an extra field showing the RBridge egress node's Nickname, as
depicted in Table 2. RBridge ingress node can cache the mapping.
If RBridge edge node receives a data packet with unknown MAC-DA, it
can query the directory server. If there is no response from the
directory server, the RBridge edge node can drop the packet.
One advantage of the Pull Model is that RBridge edge can age out
MAC&VLAN entries if they haven't been used for a certain period of
time. Therefore, each RBridge edge will only keep the entries which
are frequently used, i.e. mapping table size can be smaller. RBridge
edge would query the Directory Server(s) for unknown DAs in data
frames or ARP/ND and cache the response. When hosts attached to one
RBridge Edge rarely communicate with hosts attached to different
RBridge edges even though they are on the same VLAN, the
corresponding MAC&VLAN entries would be aged out from the RBridge's
cache.
The following table shows how target RBridge nickname can be
attached to a standard ARP Reply when replying to an ARP request
forwarded by ingress RBridge edge.
Dunbar Expires Sept24, 2012 [Page 9]
Internet-Draft Directory Assisted RBridge edge March 2011
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Hardware Type | protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| HLEN | PLEN | Operation |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sender Hardware Address (MAC) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Sender Hardware Address' cont | Sender Protocol Address (IP) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Sender Protocol Address' cont | Target Hardware Address (MAC)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Target Hardware Address' cont (MAC) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Target Protocol Address (IP) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
->| Ingress RBridge's Nickname |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
->|Ingress RBridge's Nickname ext | Egress RBridge's Nickname |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
->| Egress RBridge's Nickname extension |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Table 2: Extended fields added to standard ARP reply
The original ARP reply format consists of the first 28 octets shown
in this table. The last 12 octets in this table marked by ''->'' are
extended fields to indicate the Ingress RBridge to which originating
host is attached and the Egress RBridge to which the target host is
attached. More bits are reserved for RBridge nicknames in case
multiple levels of nicknames are needed in the future for large data
centers.
5. Conclusion and Recommendation
The traditional RBridge learning approach of observing data plane
can no longer keep pace with the ever growing number of hosts in
Data center.
Therefore, we suggest TRILL to consider directory assisted
approach(es). This draft only introduces the basic concept of using
directory assisted approach for RBridge edge nodes to learn the
MAC&VLAN<->RBridgeEdge mapping. More complete mechanisms will be
developed after the working group reaches some level of consensus.
Dunbar Expires Sept24, 2012 [Page 10]
Internet-Draft Directory Assisted RBridge edge March 2011
6. Manageability Considerations
TBD.
7. Security Considerations
TBD.
8. IANA Considerations
TBD
9. Acknowledgments
This document was prepared using 2-Word-v2.0.template.dot.
10. References
[RBridges] Perlman, et, al ''RBridge: Base Protocol Specification'',
<draft-ietf-trill-rbridge-protocol-16.txt>, March, 2010
[RBridges-AF] Perlman, et, al ''RBridges: Appointed Forwarders'',
<draft-ietf-trill-rbridge-af-02.txt>, April 2011
[ARMD-Problem] Dunbar, et,al, ''Address Resolution for Large Data
Center Problem Statement'', Oct 2010.
[ARP reduction] Shah, et. al., "ARP Broadcast Reduction for Large Data
Centers", Oct 2010
Dunbar Expires Sept24, 2012 [Page 11]
Internet-Draft Directory Assisted RBridge edge March 2011
Authors' Addresses
Linda Dunbar
Huawei Technologies
5430 Legacy Drive, Suite #175
Plano, TX 75024, USA
Phone: (469) 277 5840
Email: ldunbar@huawei.com
Donald Eastlake
Huawei Technologies
155 Beaver Street
Milford, MA 01757 USA
Phone: 1-508-333-2270
Email: d3e3e3@gmail.com
Radia Perlman
Intel Labs
2200 Mission College Blvd.
Santa Clara, CA 95054-1549 USA
Phone: +1-408-765-8080
Email: Radia@alum.mit.edu
Igor Gashinsky
Yahoo
45 West 18th Street 6th floor
New York, NY 10011
Email: igor@yahoo-inc.com
Intellectual Property Statement
The IETF Trust takes no position regarding the validity or scope of
any Intellectual Property Rights or other rights that might be
claimed to pertain to the implementation or use of the technology
described in any IETF Document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.
Copies of Intellectual Property disclosures made to the IETF
Secretariat and any assurances of licenses to be made available, or
the result of an attempt made to obtain a general license or
permission for the use of such proprietary rights by implementers or
Dunbar Expires Sept24, 2012 [Page 12]
Internet-Draft Directory Assisted RBridge edge March 2011
users of this specification can be obtained from the IETF on-line
IPR repository at http://www.ietf.org/ipr
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
any standard or specification contained in an IETF Document. Please
address the information to the IETF at ietf-ipr@ietf.org.
Disclaimer of Validity
All IETF Documents and the information contained therein are
provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION
HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY,
THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Dunbar Expires Sept24, 2012 [Page 13]
| PAFTECH AB 2003-2026 | 2026-04-24 05:18:59 |