One document matched: draft-vinod-icp-traffic-dist-00.txt
Hierarchical HTTP Routing Protocol
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).
Abstract
Recent interest in finding solutions for traffic problems stemming
from HTTP have centered around the use of cooperating proxy-caches.
We contend that by using a deterministic, hash-based approach for
routing URLs within an "array" of proxy servers, many of the benefits
of alternative cache cooperation protocols (such as ICP) may be
realized.
As an example of such an implementation we propose the use of
"Proxy Client Configuration Files" between proxy servers in order
to exchange routing information. This implementation is motivated
in part by the adoption of this file by existing, popular web
browsers to provide intelligent URL request routing.
This draft discusses adopting this well-understood, widely
implemented browser protocol by web proxies in order to facilitate
intelligent routing of requests within a network of proxy servers.
Valloppillil & Cohen [Page 1]
INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997
1. Introduction
There is significant interest in the Internet community and the
ICP working group in particular in finding mechanisms where these
public caches on individual proxy servers can be further aggregated
and shared by as many browsers as possible.
Philosophically, protocols such as ICPv2 are based on dynamic
"pinging" of neighboring proxy servers in an attempt to locate
copies of cached objects.
We propose an alternate approach based on hash-based routing of
URLs. The hash-based routing approach documented here uses a known
"request resolution path" through a network of proxies that is
determined by the URL of the request. An interesting side effect of
this deterministic mechanism is that cache duplication is avoided.
Hashing distributes the URL space among several proxies which are
assumed to be relatively equidistant from each other. Additionally,
this hash-based approach is more tuned for "hierarchical" deployments
of proxy servers. One example of this might be a departmental level
proxy which routes into an "array" of top level proxies in a
corporation which provide the gateway to an ISP. The ISP, in turn,
might operate another "array" of proxies at his/her POP.
By contrast, ICP networks typically involve peered caches which
may operate at the top level of many ISP hierarchies.
As an example of an implementation of hash-based routing, we propose
extending the existing "Proxy Client Configuration File" protocol used
by browsers to intelligently route HTTP requests.
Our proposal would implement this protocol on proxy servers in order
to provide a vendor independent mechanism for specifying sophisticated
hop-by-hop HTTP routing between groups of proxy servers.
We also demonstrate that intelligent utilization of this routing
protocol can yield almost all of the benefits of alternative cache
cooperation protocols.
We do NOT propose any specific routing scripts and instead leave
determination of such scripts up to individual vendor
implementations.
Although there are clear advantages to the use of the
Proxy Client Configuration File as the vehicle for transporting
routing information, there may be interest in the working group
in exploring other vehicles (e.g. publishing a static data table
containing proxies in an "array" implementing a well-known hash
function within proxies)
Valloppillil & Cohen [Page 2]
INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997
2. Proxy Client Configuration File
The Proxy Client Configuration File is described in [1] and [2].
Additionally, multiple interoperable implementations of this protocol
are available in popular client browsers.
As originally constructed, this file is intended for consumption by
client programs (web browsers) and is evaluated per URL to be
retrieved by the browser. The output of this script provides an
ordered series of proxy servers to be used by the browser to retrieve
the object specified by the URL.
One of the excellent properties of HTTP-proxy protocol [5] is that it
exposes proxy servers to upstream servers & upstream proxies as
regular clients. Because the administrator a group of proxies may
wish to make make assumptions about a downstream client's ability
to interpret a script, we wish to extend the metaphor to include
use of the configuration file by proxies as well as "classical"
clients.
3. Example implementation
Researchers have documented the concept of using client-side
hash-based routing to spread load across multiple proxy servers.
The deterministic nature of many of these algorithms has the
additional benefit of improving cache hit rates by creating the
image of a single logical cache spread over many proxies. [4]
In this proposal, the administrator of an "array" of proxies at an
ISP may wish to construct a script that hashes URLs and distributes
the hash space across each of his/her proxy servers. Using the same
downstream script, the administrator should be able to service both
dial-in clients (whose browsers already support the protocol) as well
as leased lines to corporate proxies.
The hop-by-hop nature of the routing provides additional flexibility
in this example. The corporation may wish to use one particular
routing script internally (one which tells clients to directly access
intranet content, for example) whereas the ISP may wish for the
corporation's proxy servers to use a different script to route into
the ISP's proxies (one which routes all requests through the caches
for maximum hit rates).
Valloppillil & Cohen [Page 3]
INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997
4. Security Considerations
Security issues are not directly addressed in this document. Any
security functionality is derived from the underlying HTTP layer.
Some consideration may need to be given to ensure the integrity /
security of the initial script passing. More specifically, this
draft doesn't address issues that may stem from the possiblity that
malicious scripts may be constructed.
5. Advantages of script-based routing vs. ICP v2
We now provide a comparison of this proposal vs. the current Internet
Cache Protocol draft [3].
a. Symmetric protocol between client -> proxy and proxy -> proxy
This preserves the symmetry of HTTP's presentation of proxy servers
as "mega clients" to upstream servers / proxies.
ICP is not currently processed / generated by client browsers.
b. Eliminate messages for cache 'miss' events.
A very significant percentage of all ICP messages exchanged in the
field are cache "misses." [NLANR's field experience indicates that
85-90% of all ICP transactions are "misses".]
Because this protocol eliminates querying, miss messages no longer
occur (the outcome of all forwards are now either either "cache
hit" or "continue resolving upstream").
c. Takes advantage of all HTTP work including options, cache-control,
authentication, etc.
HTTP already provides protocol options to perform functions such as
proxy to proxy authentication, etc. These functions don't have to
be re-invented.
Additionally, much of the new behavior in the HTTP 1.1 cache-control
headers is not expressible in ICPv2. Forwarding the entire HTTP
request to the next upstream/neighboring proxy allows it to be
privy to these options.
d. Already implemented on the browser
Eases compliance testing and demonstrates soundness of the protocol
(in a limited case).
Valloppillil & Cohen [Page 4]
INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997
e. Sorted requests between proxies = single logical cache
Over time, assuming that URL requests are randomly routed (e.g.
round robin DNS) to a set of peer ICP neighbors (e.g. on a LAN
within an ISP's head-end), the contents of these neighboring
caches will eventually become roughly identical.
A deterministic hash-based routing scheme, however, provides for a
single logical cache image across 'n' proxies instead of 'n'
identical caches.
ICP's peer to peer queries are replaced by intelligent request
routing in the previous level of the hierarchy.
f. No new transport protocols
The behavior of HTTP is already well understood by system
administrators and passed through firewalls, etc. By contrast,
ICP is relatively unknown in the vast majority of intranets
which may affect speed of deployment.
In general, the development and deployment of new wire protocols
should be a carefully evaluated endeavor due to huge support
costs and "entropy" effects on corporate networks.
6. Advantages of ICP v2 vs. script-based routing
a. Exchange of messages over WAN
ICP is sometimes used across very wide area links to perform
cache look-ups. An example of this might be peered top-level
caches between two overseas ISPs. This protocol is more
intended for use by proxies that are in relative proximity to each
other.
One critical question is whether these transoceanic cache
look-ups are worth their cost. This is especially a concern given
the opportunity to build larger caches within a traditional cache
hierarchy. Do large local caches "skim" most of the potential
cache hits? This question could be answered with some idea of the
hit rate for ICP over WAN links between very large peer caches.
Valloppillil & Cohen [Page 5]
INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997
b. Exchange of messages across peer administrative domains
Correct implementation of the proxy configuration script is in part
dependent on having a series of proxies within the same
administrative domain which share their logical cache.
Because ICP maintains a very loose relationship between neighbors,
it is easier to implement across such domains. However, once
again, the question of whether anything more than 2 or 3 levels of
cache look-ups is valuable becomes pertinent. If not, then a 2-3
level hierarchical array of proxies within corporations & ISPs
might be sufficient for maximum cache hit rates.
c. Binary protocol
ICP is clearly faster and easier to parse than HTTP due to it's
binary nature. However, the construction of efficient HTTP engines
is already at a premium due to the wide deployment of the protocol.
d. Connectionless transport
ICP can and often is transported over UDP which is lighter weight
than HTTP's TCP connection. Many of these disadvantages may be
mitigated by performance optimizations such as keep-alives and
pipelining.
Additionally, notice that in the case of a cache hit, ICP may
require construction of a TCP connection to transport the requested
object.
Furthermore, the lack of congestion control on ICP messages is
the obvious downside of connectionless transport. In this scheme
connections between proxy servers would almost certainly be HTTP
Keep-Alive sessions.
e. Failure case benefit.
If for some reason, the ICP cache who has a URL is too slow to
respond or is down an alternate cache will be used to fulfill
the request. It is likely that this cache will cache the
results. At any later point in time, this cache will respond
with a HIT message when queried about the URL. This allows
very busy URLs to be spread among multiple caches and stems from
the non-deterministic nature of the protocol.
In the hashing scheme, if a busy set of URLS is assigned to one
cache via the hash, and that server is too slow or down, another
cache will handle and cache that request. Unfortunately, that
cached version is of no use to any clients or proxies anymore
since the clients/proxies will never go to that proxy again if it
doesnt match the hash function.
Valloppillil & Cohen [Page 7]
INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997
f. Server distance determination
In the field, a secondary benefit of ICP has been use of its
UDP round-trip times as a means of guaging relative distance
between peer caches. Because hash-based routing relies on TCP
and implies hierarchies known a priori, this feature of ICP
isn't realized.
g. Current installed base
ICP currently has an installed base of ~3000 proxies.
7. Open Issues
As specified via Proxy Client Configuration files, there are
two primary open issues associated with this protocol:
1) Standardization of the Proxy-client configuration file.
Currently, this protocol is only a de facto standard and has not
been formally accepted / endorsed by the IETF
2) Performance of script evaluation on proxy servers.
There are potentially significant issues with evaluating proxy
configuration scripts per URL processed by a proxy server.
Requiring an interpreter for Javascript [1] may be outside of
the bounds of the working group.
Additionally, performance of the script + script interpreter may
be a significant cost for proxy servers which need to handle high
transaction volumes.
8. Acknowledgements
The authors would like to thank Brian Smith, Kip Compton, Ari
Luotonen, and Kerry Schwartz for their assistance in preparing
this document.
Valloppillil & Cohen [Page 8]
INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997
9. References
[1] Luotonen, Ari., "Navigator Proxy Auto-Config File Format",
Netscape Corporation, http://home.netscape.com/eng/mozilla/2.0/
relnotes/demo/proxy-live.html, March 1996.
[2] Microsoft Corporation., "Automatic Proxy Configuration",
http://www.microsoft.com/ie/ieak/autosys.htm, March 21, 1997.
[3] Wessels, Duane., "Internet Cache Protocol Version 2", http://ds.
internic.net/internet-drafts/draft-wessels-icp-v2-00.txt, March 21,
1997.
[4] Sharp Corporation., "Super Proxy Script",
http://naragw.sharp.co.jp/sps/, August 9, 1996.
[5] Fielding, R., et. al, "Hypertext Transfer Protocol -- HTTP/1.1",
RFC 2068, UC Irvine, January 1997.
10. Author Information
Vinod Valloppillil
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
Phone: 1.206.703.3460
Email: VinodV@Microsoft.Com
Josh Cohen
Netscape Communications Corporation
501 E. Middlefield Rd.
Mountain View, CA 94043
Phone: 1.415.937.4157
Email: Josh@Netscape.Com
Expires October 1997
| PAFTECH AB 2003-2026 | 2026-04-24 06:07:10 |