One document matched: draft-zhang-alto-traceroute-00.txt


Network Working Group                                               Liufei. Wen 
Internet Draft                           Huawei Technologies Network Working Group 
Intended status: Informational                              Yunfei. Zhang 
Expires: February 1, 2009                                         China Mobile 
 
                                                            July 4, 2008 
                                   
 
                                      
               P2P Traffic Localization by Traceroute and 2-Means Classification 
                       draft-zhang-alto-traceroute-00.txt 


Status of this Memo 

   By submitting this Internet-Draft, each author represents that       
   any applicable patent or other IPR claims of which he or she is       
   aware have been or will be disclosed, and any of which he or she       
   becomes aware will be disclosed, in accordance with Section 6 of       
   BCP 79. 

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-Drafts. 

   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 

   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt 

   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 

   This Internet-Draft will expire on February 1, 2009. 

Copyright Notice 

   Copyright (C) The IETF Trust (2008) 

Abstract 

   Most P2P system performance suffers from the mismatch between the randomly  
   constructed overlays topology and the underlying physical network topology, 
   causing a large burden in the ISP and a long RTT time. This document  
 
 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 1] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

   describes how DHT overlay peers can interact with the routers by traceroute to  
   get the path information, and execute 2-Means Classification, thereafter peers  
   leverage the DHT itself to build efficient "closer" cluster. This scheme only  
      requires the infrastructure to enable traceroute queries.  

Conventions used in this document 

   In examples, "C:" and "S:" indicate lines sent by the client and 
   server respectively. 

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in RFC-2119. 

Table of Contents 

    
   1. Introduction................................................2 
      1.1. Terminology..............................................3 
   2. Overview......................................................4 
   3. Traceroute and Clustering........................................4 
      3.1. Peer Traceroute...........................................4 
      3.2. 2-Means Classification....................................5 
      3.3. Form the Cluster...........................................6 
      3.4. Update...................................................7 
   4. Enhancement Examples............................................7 
      4.1. Find the proximate candidates...........................7 
      4.2. More Efficient Overlay Routing..........................7 
      4.3. Placement of Cache......................................7 
   5. Security Considerations......................................8 
   6. IANA Considerations.........................................8 
   References.....................................................9 
   Author's Addresses.............................................9 
   Intellectual Property Statement.................................9 
   Disclaimer of Validity........................................10 
    
1. Introduction 

   This document describes how DHT overlay peers get the topology  
   information and reduce the mismatch by traceroute and 2-Means  
   classification. In particular, an assumption is made about the 
   infrastructure routers support peers' traceroute requests, no  
   matter what specific means(ICMP traceroute or TCP traceroute). 
 
   In a P2P system, each end node provides services to other  
   participating nodes as well as receives services from them. An  
 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 2] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

   attractive feature of P2P is that peers do not need to directly  
   interact with the underlying physical network, providing many  
   new opportunities for user-level development and applications.  
   Nevertheless, the mechanism for a peer to randomly choose logical  
   neighbors, without any knowledge about the physical topology,  
   causes a serious topology mismatch between the P2P overlay networks 
   and the physical networks. 
 
   The mismatch between physical topologies and logical overlays is a  
   major factor that delays the lookup response time, which is determined 
   by the product of the routing hops and the link latencies. Mismatch  
   problem also causes a large volume of redundant traffic in  
   inter-domain between the every ISP. These has constituted the  
   motivation to the topology-aware P2P, which implies to mitigate such  
   drawbacks. 
    
   The purpose of this document is to specify a way to efficient topology 
   matching technique. The DHT overlay peers' Traceroute result are used 
   to get "near" clusters and Edge Gateway by execute 2-Means  
   classification. This information will be put into the DHT. Two peers  
   will be considered as to hava a close neighbor relationship, if they  
   have at least one coomon router among their "near" clusters and Edge  
      Gateways. 

1.1. Terminology 

   In this document, the key words "MUST", "MUST NOT", "REQUIRED", 
   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT 
   RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as 
   described RFC 2119 [RFC2119]. 
     
   This section defines some key concepts using in this document. 
    
   DHT  - distributed hash table (DHT). 
    
    DHT Overlay :  An overlay network is a computer network which is 
      built on top of another network. The peer-to-peer networks are 
      overlay networks because they run on top of the Internet. And the  
  peer-to-peer network which build with DHT is DHT Overlay.  
    
   2-means classification: k-means classification is an algorithm to  
      classify objects based on their attributes into K number of group. 
      2-means classification algorithm is a special example of k-means 
 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 3] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

      classification algorithm with k=2. 
2. Overview 

  Usually, to solve the mismatch problem, it needs three steps, first  
   to estimate the physical network distance between two overlay peers 
   through network probing or prediction. then,  based on this proximity 
   information , to cluster "near" peers, so as to let peers can find a 
   better candidate than the randomly chosen result. At last, such  
   results are utilized to optimize P2P alogrithm. The "near" peers are  
   the preferential choice in P2P lookup and maintenance, and these will 
   impel the access and traffic localization and reduce delay. 
        
   +-------------------------------------------------------------+    
   |                    |                     |                  | 
   |      Measure&      |     Clustering      |  Impelled local  | 
   |      Predict       |                     |  access pattern  | 
   |                    |                     |                  | 
   +-------------------------------------------------------------+   
        
    Figure 1 Basic Steps of P2P Traffic Localization 
     
    In this document, an efficient topology matching technique is  
    specified. First, before a peer joining the P2P networks, it  
    randomly picks an Internet IP address and probes it using the  
    traceroute tools. According to the measured data, the peer tracked 
    the return information to a vector data. Then 2-means classification 
    algorithm is used to classify the Internet routers into "near" and 
    "remote" routers. Finally, peer chose the router with maximum Hops 
    item in "near" set as a the Edge Gateway. The peer registers into  
    the DHT overlays with the Edge Gateway the Key, then do same to the 
    "near" set. Through shared vector cluster information such as "near" 
    routers cluster and Edge Gateway, two peers were considered as a  
    close neighbor relationship when their "near" routers cluster both  
    had a same router's IP address at least and then gathered together  
       to form a "close" peer clusters. 

3. Traceroute and Clustering   

3.1. Peer Traceroute  

   As a rapid developed network, the Internet presents two kinds of 
    basic characteristic. On one hand, with many end user hosts randomly 
    join and leave the network, the topology of Internet is dynamic and  
    variable,  but the routers constitute a much more stable  
 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 4] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

    infrastructure topology. On the other hand, regarding Internet as a 
    graph (V=routers, E=direct link between routers), we found that the 
    edges between ASes constitutes a very small portion among the total 
    edges, and usually the delay between ISP routers is more large than 
    the delay between routers in the same AS domains. 
    

   A peer need randomly picked an Internet IP address and probed it 
    using the traceroute tools.  The peer tracked the return information 
    to a vector data, with the data structure <IP, Hops, Latency>. 
     
       5ms  10ms    100ms    6ms    150ms   20ms  8ms 
      R1--R2---R3----------R4--R5----------R6---R7--R8 
           Fig.2 The Traceroute Result 
     
    Fig.2 is an example of path information by R1 traceroute to R8. As 
    is clear from Fig.2, between the R3 and R4, R5 and R6, there are  
    some huge latency leaps than the others. It possibly means that 
    traceroute message across the different AS domains or different ISP 
    ranges.      
3.2. 2-Means Classification 

   When the overlay peer gets the traceroute result through randomly 
    probe, a 2-means classification algorithm is used to classify the 
    Internet routers based on the Latency attribute in these traceroute 
    result. The 2-means classification algorithm includes four steps as 
    following: 
     
    step1. Peer chooses the minimum latency item and maximum latency item 
    in whole vectors as centroids for two initialization sets "first"   
    and "second".  
     
    As a example in Fig.2, <R1, 1, 5ms> and <R5, 5, 150ms> is selected  
    as centroids for sets "first" and "second". 
     
    step2. Peer takes the latency item in vector to make an absolute 
    distance value with two centroids in turn, and then separately 
    associated the corresponding vector to one vector cluster that has 
    smaller absolute distance value.  
     
    So for the Fig2. example, the vector of R2 is <R2, 2, 10ms>, and the  
    distance between R2 and R1 is 10-5 = 5, and the distance between R2  
    and R5 is 150-10 = 140. So <R2, 2, 10ms> belongs to the "first" set. 
 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 5] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

    
    step3. Peer calculates the latency mean and variance value of two 
    vector clusters. As in in Fig.2, the R1,R2,R4,R6,R7 belong to the  
    "first" set, and R3,R5 belong to the "second" set and the mean and  
    variance can be calculated. 
     
    step4. If the variance value was larger than the threshold, peer 
    picks two latency mean values as new centroids of "first" and 
    "second" sets, then goto step2, otherwise finishes the classification 
   and gets two "first" and "second" sets. 
    
   Finally, peer chooses the router with minimum Hops item in "second"  
    set as a hop threshold. This router and the other routers whose Hops       
    item are larger than the hop threshold all divided into a "remote"  
    router cluster. And then the remaining routers are gathered into  
    another "near" cluster. and the router with maximum hops of the 
    "near" cluster is regarded as overlay peer's Edge Gateway. 
     
    So for the Fig.2 example, R1,R2,R3 are the "near" routers of overlay 
    peer and others routers are the "remoter" routers of the overlay  
    peer. R3 will be the Edge Gateway in this case. 
    

3.3. Form the Cluster 

   The peer registers into the P2P overlays with their Edge Gateway  
    and "near" routers as the Key, and the DHT ID and IP of itself as 
    the value.Due to the essence of DHT, if two item have the same Key, 
    they will be routed to the same DHT peer. Thus it makes possible to 
    let several peers to know each other, if they have some common  
    elements of their "near" router set. Edge Gateway is more useful,  
    so if the hosting DHT peers become overloaded, it will firstly  
    deletes such non Edge Gateway routers item. The hosting DHT peer 
    will notify those peers to form a "close" cluster, or join an  
    existed cluster. 
         
    Each peer clusters has a ClusterId generated by consistent hashing 
    when the first two peers decided to form the cluster. The ClusterId 
    and its member peer Ids will be PUT into the DHT, and the peer can 
    find the other members within the same cluster by DHT GET with its 
    ClusterId remembered during its last online life.  
    


 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 6] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

3.4. Update 

   During normal peer-to-peer interactions such as DHT lookup or  
    maintenance, if peers belonging to different clusters found the 
    delay between them were relatively low, then these two clusters  
    should decide to combine a new bigger cluster. The mapping between 
    those original ClusterIds and the new generated ClusterIds should 
    also be registered into the DHT so as to let those peers belonging 
    to old clusters could find and join the new cluster. This technique 
    alleviates the problem occurred when peers belonging to same  
    cluster get different Edge Gateways from their traceroute response, 
       thus they can not form the ideal bigger cluster. 

4. Enhancement Examples 

4.1. Find the proximate candidates 

   Peers register their resources to be shared as <Key, Value> pairs  
    into the DHT. Usually the Key is generated by consistent hashing  
   some information like the file name, and the Value is the IP address 
   of the sharing peer. When use our scheme, we will also include the  
    sharing peer's ClusterID in the Value. 
     
    When a overlay peer wants to find a resource, it will raise a DHT  
    get with the hashed Key and piggyback its ClusterID.  When getting 
    the request, the peer hosting the Key k will check the list of <Key, 
    Value> pairs registered by all the sharing candidates, then it will  
    choose the sharing peer with the same ClusterId to be the preference  
   result. 

4.2. More Efficient Overlay Routing 

   The delay between the ISP is larger more than the inner-domain delay. 
    And if AS domain is big enough many resource can be found in the  
    same AS domains. So a more efficient hierarchical P2P network is  
    feasible. Each low layer (local) DHT is composed by the peers with  
    same ClusterID. All the peers or some candidate peers from each  
    local DHT will join the global DHT. Every peer firstly search in  
    its local DHT for its desired resource, then it may switched to the 
   glocal DHT only if the resource not available locally.   

4.3. Placement of Cache   

   In order to reduce the inter-domain traffic and delay, cache is  
 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 7] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

    always considered in the P2P network. The placement strategy takes 
    the cluster info into account. It will place caches to cover the  
    peer clusters in the order of its population, the number of peers 
    participating the cluster. 
                 
5. Security Considerations 

   This document does not currently introduce security considerations. 
6. IANA Considerations 

   This document does not specify IANA considerations. 


































 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 8] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

References 

   [1]RFC2119, Bradner, S., "Key words for use in RFCs to Indicate 
              Requirement Levels", BCP 14, RFC 2119, March 1997. 
             
   [2]IFIP Networking 2008, Guangyu Shi, Youshui Long. "T2MC: A Peer-to 
                           -Peer Mismatch Reduce Technique by Traceroute 
                          and 2-Means Classification Algorithm." 

Author's Addresses 

   Yunfei Zhang  
   China Mobile Communications Corporation 
      
   Phone: +86 10 66006688 
   Email: zhangyunfei@chinamobile.com 
    

   Wen liufei 
   Huawei Technologies Co. Ltd. 
      
   Phone: +86 755 28977571 
   Email: wenliufei@huawei.com 
    

Intellectual Property Statement 

   The IETF takes no position regarding the validity or scope of any 
   Intellectual Property Rights or other rights that might be claimed to 
   pertain to the implementation or use of the technology described in 
   this document or the extent to which any license under such rights 
   might or might not be available; nor does it represent that it has 
   made any independent effort to identify any such rights.  Information 
   on the procedures with respect to rights in RFC documents can be 
   found in BCP 78 and BCP 79. 

   Copies of IPR disclosures made to the IETF Secretariat and any 
   assurances of licenses to be made available, or the result of an 
   attempt made to obtain a general license or permission for the use of 
   such proprietary rights by implementers or users of this 
   specification can be obtained from the IETF on-line IPR repository at 
   http://www.ietf.org/ipr. 

   The IETF invites any interested party to bring to its attention any 
   copyrights, patents or patent applications, or other proprietary 
   rights that may cover technology that may be required to implement 
 
 
Wen & Zhang                 Expires February 1, 2009                  [Page 9] 

Internet-Draft       P2P Traffic Localization by Traceroute          July 20082008 
    

   this standard.  Please address the information to the IETF at 
   ietf-ipr@ietf.org. 

Disclaimer of Validity 

   This document and the information contained herein are provided on an 
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 

Copyright Statement 

   Copyright (C) The IETF Trust (2008). 

   This document is subject to the rights, licenses and restrictions 
   contained in BCP 78, and except as set forth therein, the authors 
   retain all their rights. 

Acknowledgment 

   Funding for the RFC Editor function is currently provided by the 
   Internet Society. 

    



















 
 
Wen & Zhang                 Expires February 1, 2009                 [Page 10] 


PAFTECH AB 2003-20262026-04-23 09:51:13