One document matched: draft-ietf-http-hit-metering-00.txt
Simple Hit-Metering for HTTP
Preliminary Draft
draft-ietf-http-hit-metering-00.txt
STATUS OF THIS MEMO
This document is an Internet-Draft. Internet-Drafts are
working documents of the Internet Engineering Task Force
(IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by
other documents at any time. It is inappropriate to use
Internet-Drafts as reference material or to cite them other
than as "work in progress."
To learn the current status of any Internet-Draft, please
check the "1id-abstracts.txt" listing contained in the
Internet-Drafts Shadow Directories on ftp.is.co.za
(Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific
Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US
West Coast).
Distribution of this document is unlimited. Please send
comments to the HTTP working group at
<http-wg@cuckoo.hpl.hp.com>. Discussions of the working
group are archived at
<URL:http://www.ics.uci.edu/pub/ietf/http/>. General
discussions about HTTP and the applications which use HTTP
should take place on the <www-talk@w3.org> mailing list.
ABSTRACT
This draft proposes a simple extension to HTTP, using a new
``Meter'' header, to permit a limited form of demographic
information (colloquially called ``hit-counts'') to be
reported by caches to origin servers, in a more efficient
manner than the ``cache-busting'' techniques currently
used. It also permits an origin server to control the
number of times a cache uses a cached response, and
outlines a technique that origin servers can use to capture
referral information without ``cache-busting.''
Mogul, Leach [Page 1]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
TABLE OF CONTENTS
1 Introduction 2
1.1 Goals, non-goals, and limitations 3
1.2 Brief summary of the design 4
2 Overview 5
2.1 Discussion 7
3 Design concepts 7
3.1 Implementation of the "metering subtree" 8
3.2 Format of the Meter header 9
3.3 Negotiation of hit-metering and usage-limiting 10
3.4 Transmission of usage reports 13
3.5 When to send usage reports 14
3.6 Subdivision of usage-limits 16
4 Analysis 17
4.1 What about "Network Computers"? 18
4.2 Why max-uses is not a Cache-control directive 19
5 Specification 19
5.1 Specification of Meter header and directives 19
5.2 Abbreviations for Meter directives 21
5.3 Counting rules 22
5.3.1 Counting rules for hit-metering 23
5.3.2 Counting rules for usage-limiting 23
5.3.3 Equivalent algorithms are allowed 24
5.4 Counting rules: interaction with Range requests 25
5.5 Implementation by non-caching proxies 25
6 Expressing or approximating the "proxy-mustcheck" directive 26
7 Examples 27
7.1 Example of a complete set of exchanges 27
7.2 Protecting against HTTP/1.0 proxies 29
7.3 More elaborate examples 29
8 Interactions with varying resources 30
9 A Note on Capturing Referrals 31
10 Security Considerations 32
11 Revision history 32
11.1 draft-mogul-http-hit-metering-01.txt 32
11.2 draft-mogul-http-hit-metering-00.txt 33
12 Acknowledgements 33
13 References 33
14 Authors' addresses 33
1 Introduction
For a variety of reasons, content providers want to be able to
collect information on the frequency with which their content is
accessed. This desire leads to some of the "cache-busting" done by
existing servers (exactly how much is unknown). This kind of
cache-busting is done not for the purpose of maintaining transparency
or security properties, but simply to collect demographic
information. It has also been pointed out that some cache-busting is
Mogul, Leach [Page 2]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
also done to provide different advertising images to appear on the
same page (i.e., each retrieval of the page sees a different ad).
One model that this proposal tries to support is one reasonably
similar to that of publishers of hard-copy publications: such
publishers (try to) report to their advertisers how many people read
an issue of a publication at least once; they don't (try to) report
how many times a reader re-reads an issue. They do this by counting
copies published, and then try to estimate, for their publication, on
average how many people read a single copy at least once. The key
point is that the results aren't exact, but are still useful. Another
model is that of coding inquiries in such a way that the advertiser
can tell which publication produced the inquiry.
1.1 Goals, non-goals, and limitations
HTTP/1.1 already allows origin servers to prevent caching of
responses, and we have evidence that at least some of the time, this
is being done for the sole purpose of collecting counts of the number
of accesses of specific pages. Some of this evidence is inferred
from the study of proxy traces; some is based on explicit statements
of the intention of the operators of Web servers. We take no
position on whether the information collected this way is of use to
the people who collect it; the fact is that they want to collect it,
or already do so.
Our goal in this proposal is to provide an optional performance
optimization for this use of HTTP/1.1.
Our proposal is:
- Optional: no server or proxy is required to implement it.
- Proxy-centered: there is no involvement on the part of
end-client implementations.
- Solely a performance optimization: it provides no
information or functionality that is not already available
in HTTP/1.1. Our intention is to improve performance
overall, and reduce latency for almost all interactions; we
do not purport to reduce latency for every single HTTP
interaction.
- Best-efforts: it does not guarantee the accuracy of the
reported information, although it does provide accurate
results in the absence of persistent network failures or
host crashes.
- Neutral with respect to privacy: it reveals to servers no
information about clients that is not already available
through the existing features of HTTP/1.1.
Mogul, Leach [Page 3]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
To the extent that any part of this specification conflicts with
these criteria, we would consider that to be a bug, and will
undertake to resolve this when it is brought to our attention.
Our goals do not include:
- Solving the entire problem of efficiently obtaining
extensive information about requests made via proxies.
- Improving the protection of user privacy (although our
proposal may reduce the transfer of user-specific
information to servers, it does not prevent it).
- Preventing or encouraging the use of log-exchange
mechanisms.
- Avoiding all forms of "cache-busting", or even all
cache-busting done for gathering counts.
We recognize certain potential limitations of our design:
- If it is not deployed widely in both proxies and servers,
it will provide little benefit.
- It may, by partially solving the hit-counting problem,
reduce the pressure to adopt (hypothetical) more complete
solutions.
- Even if widely deployed, it might not be widely used, and
so might not significantly improve performance.
We do not believe that these potential limitations are problems in
reality.
1.2 Brief summary of the design
This section is included for people not wishing to read the entire
document; it is not a specification for the proposed design, and
over-simplifies many aspects of the design.
Our goal is to eliminate the need for origin servers to use
"cache-busting" techniques, when this is done just for the purpose of
counting the number of users of a resource. (Cache-busting includes
techniques such as setting immediate Expiration dates, or sending
"Cache-control: private" in each response.)
We add a new "Meter" header to HTTP; the header is always protected
by the "Connection" header, and so is always hop-by-hop. This
mechanism allows us to construct a "metering subtree", which is a
connected subtree of proxies, rooted at an origin server. Only those
proxies that explicitly volunteer to join in the metering subtree for
a resource participate in hit-metering, but those proxies that do
Mogul, Leach [Page 4]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
volunteer are required to make their best effort to provide accurate
counts. When a hit-metered response is forwarded outside of the
metering subtree, the forwarding proxy adds "Cache-control:
proxy-mustcheck", so that other proxies (outside the metering
subtree) are forced to forward all requests to a server in the
metering subtree.
---------
NOTE: the HTTP/1.1 specification does NOT define a
"proxy-mustcheck" Cache-control directive. We use this name as
a placeholder for a directive meaning "proxies must revalidate
this response even if fresh," which is not currently defined in
HTTP/1.1. In section 6 we describe several alternatives for
expressing or approximating this placeholder; see also [2].
---------
The Meter header carries zero or more directives, similar to the way
that the Cache-control header carries directives. Proxies may use
certain Meter directives to volunteer to do hit-metering for a
resource. If a proxy does volunteer, the server may use certain
directives to require that a response be hit-metered. Finally,
proxies use a "count" Meter directive to report the accumulated hit
counts.
The Meter mechanism can also be used by a server to limit the number
of uses that a cache may make of a cached response, before
revalidating it.
The full specification includes complete rules for counting "uses" of
a response (e.g., non-conditional GETs) and "reuses" (conditional
GETs). These rules ensure that the results are entirely consistent
in all cases, except when systems or networks fail.
2 Overview
The design described in this document introduces several new features
to HTTP:
- Hit-metering: allows an origin server to obtain reasonably
accurate counts of the number of clients using a resource
instance via a proxy cache, or a hierarchy of proxy caches.
- Usage-limiting: allows an origin server to control the
number of times a cached response may be used by a proxy
cache, or a hierarchy of proxy caches, before revalidation
with the origin server.
These new non-mandatory features require minimal new protocol
support, no change in protocol version, relatively little overhead in
message headers, and no additional network round-trips in any
critical path.
Mogul, Leach [Page 5]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
The primary goal of hit-metering and usage-limiting is to obviate the
need for an origin server to send "Cache-control: proxy-mustcheck"
with responses for resources whose value is not likely to change
immediately. In other words, in cases where the only reason for
contacting the origin server on every request that might otherwise be
satisfied by a proxy cache entry is to allow the server to collect
demographic information or to control the number of times a cache
entry is used, the extension proposed here will avoid a significant
amount of unnecessary network traffic and latency.
This design introduces one new ``Meter'' header, which is used both
in HTTP request messages and HTTP response messages. The Meter
header is used to transmit a number of directives and reports. In
particular, all negotiation of the use of hit-metering and usage
limits is done using this header. No other changes to the existing
HTTP/1.1 specification [1] are proposed in this document.
This design also introduces several new concepts:
1. The concepts of a "use" of a cache entry, which is when a
proxy returns its entity-body in response to a conditional
or non-conditional request, and the "reuse" of a cache
entry, which is when a proxy returns a 304 (Not Modified)
response to a conditional request which is satisfied by
that cache entry.
2. The concept of a hit-metered resource, for which proxy
caches make a best-effort attempt to report accurate
counts of uses and/or reuses to the origin server.
3. The concept of a usage-limited resource, for which the
origin server expects proxy caches to limit the number of
uses and/or reuses.
The new Meter directives and reports interact to allow proxy caches
and servers to cooperate in the collection of demographic data. The
goal is a best-efforts approximation of the true number of uses
and/or reuses, not a guaranteed exact count.
The new Meter directives also allow a server to bound the inaccuracy
of a particular hit-count, by bounding the number of uses between
reports. It can also, for example, bound the number of times the
same ad is shown because of caching.
We also identify a way to use server-driven content negotiation (the
Vary header) that allows an HTTP origin server to flexibly separate
requests into categories and count requests by category (see section
8). Implementation of such a categorized hit counting is likely to
be a very small modification to most implementations of Vary; some
implementations may not require any modification at all.
Mogul, Leach [Page 6]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
2.1 Discussion
Mapping this onto the publishing model, a proxy cache would increment
the use-count for a cache entry once for each unconditional GET done
for the entry, and once for each conditional GET that results in
sending a copy of the entry to update a client's invalid cached copy.
Conditional GETs that result in 304 (Not Modified) are not included
in the use-count, because they do not result in a new user seeing the
page, but instead signify a repeat view by a user that had seen it
before. However, 304 responses are counted in the reuse-count.
HEADs are not counted at all, because their responses do not contain
an entity-body.
The Meter directives apply only to shared proxy caches, not to
end-client (or other single-user) caches. Single user caches should
not use Meter, because their hits will be automatically counted as a
result of the unconditional GET with which they first fetch the page,
from either the origin-server or from a proxy cache. Their
subsequent conditional GETs do not result in a new user seeing the
page.
---------
Note: this means that the reuse-count does not include reuses
done locally to the end-client. While there are some reasons
to want to collect such information, especially for research
into user behavior patterns, we believe that the reasons
against doing so (network overheads, additional client
complexity, and possible privacy issues) are stronger.
However, we encourage further discussion of this issue.
---------
The mechanism specified here counts GETs; other methods either do not
result in a page for the user to read, aren't cached, or are
"written-through" and so can be directly counted by the origin
server. (If, in the future, a "cachable POST" came into existence,
whereby the entity-body in the POST request was used to select a
cached response, then such POSTs would have to be treated just like
GETs.)
In the case of multiple caches along a path, a proxy cache does the
obvious summation when it receives a use-count or reuse-count in a
request from another cache.
3 Design concepts
In order to allow the introduction of hit-metering and usage-limiting
without requiring a protocol revision, and to ensure a reasonably
close approximation of accurate counts, the negotiation of metering
and usage-limiting is done hop-by-hop, not end-to-end. If one
considers the "tree" of proxies that receive, store, and forward a
specific response, the intent of this design is that within some
Mogul, Leach [Page 7]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
(possibly null) "metering subtree", rooted at the origin server, all
proxies are using the hit-metering and/or usage-limiting requested by
the origin server.
Proxies at the leaves of this subtree will insert a "Cache-control:
proxy-mustcheck" directive, which forces all other proxies (below
this subtree) to check with a leaf of the metering subtree on every
request. However, it does not prevent them from storing and using
the response, if the revalidation succeeds.
No proxy is required to implement hit-metering or usage-limiting.
However, any proxy that transmits the Meter header in a request MUST
implement every requirement of this specification, without exception
or amendment.
This is a conservative design, which may sometimes fail to take
advantage of hit-metering support in proxies outside the metering
subtree. However, we believe that without a conservative design,
managers of origin servers with requirements for accurate information
will not take advantage of any hit-metering proposal.
The hit-metering/usage-limiting mechanism is designed to avoid any
extra network round-trips in the critical path of any client request,
and (as much as possible) to avoid excessively lengthening HTTP
messages.
The Meter header is used to transmit both negotiation information and
numeric information.
A formal specification for the Meter header appears in section 5; the
following discussion uses an informal approach to improve clarity.
3.1 Implementation of the "metering subtree"
The "metering subtree" approach is implemented in a simple,
straightforward way by defining the new "Meter" header as one that
MUST always be protected by a Connection header in every request or
response. I.e., if the Meter header is present in an HTTP message,
that message:
1. MUST contain "Connection: meter", and MUST be handled
according to the HTTP/1.1 specification of the Connection
header.
2. MUST NOT be sent in response to a request from a client
whose version number is less than HTTP/1.1.
3. MUST NOT be accepted from a client whose version number is
less than HTTP/1.1.
The reason for the latter two restrictions is to protect against
proxies that might not properly implement the Connection header.
Mogul, Leach [Page 8]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
Otherwise, a subtree that includes an HTTP/1.0 proxy might
erroneously appear to be a metering subtree.
---------
Note: We believe that in order for the Connection header
mechanism to function correctly, a system receiving an HTTP/1.0
(or lower-version) message that includes a Connection header
must act as if this header, and all of the headers it protects,
ought to have been removed from the message by an intermediate
proxy.
Although the current draft of the HTTP/1.1 specification does
not specifically require this behavior, we believe that it is
implied. Otherwise, one could not depend on the stated
property (section 14.10) that the protected options ``MUST NOT
be communicated by proxies over further connections.'' We
suggest that this be clarified in a subsequent draft of the
HTTP/1.1 specification.
We do not, in any way, propose a modification of the
specification of the Connection header.
---------
From the point of view of an origin server, the proxies in a metering
subtree work together to obey usage limits and to maintain accurate
usage counts. When an origin server specifies a usage limit, a proxy
in the metering subtree may subdivide this limit among its children
in the subtree as it sees fit. Similarly, when a proxy in the
subtree receives a usage report, it ensures that the hits represented
by this report are summed properly and reported to the origin server.
When a proxy forwards a hit-metered or usage-limited response to a
client (proxy or end-client) not in the metering subtree, it MUST
omit the Meter header, and it MUST add "Cache-control:
proxy-mustcheck" to the response.
---------
Design question: alternatively, we could specify that the
origin server is responsible for adding "Cache-control:
proxy-mustcheck" to the response, and that a proxy in the
metering subtree should ignore this directive, unless it has
exhausted one of the usage limits. This would get the proxies
out of the business of adding headers to responses, but it
would increase the number of bytes in the response from the
origin server.
---------
3.2 Format of the Meter header
The Meter header is used to carry zero or more directives. Multiple
Meter headers may occur in an HTTP message, but according to the
rules in section 4.2 of the HTTP/1.1 specification [1], they may be
Mogul, Leach [Page 9]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
combined into a single header (and should be so combined, to reduce
overhead).
For example, the following sequence of Meter headers
Meter: max-uses=3
Meter: max-reuses=10
Meter: do-report
may be expressed as
Meter: max-uses=3, max-reuses=10, do-report
3.3 Negotiation of hit-metering and usage-limiting
An origin server that wants to collect hit counts for a resource, by
simply forcing all requests to bypass any proxy caches, would respond
to requests on the resource with "Cache-control: proxy-mustcheck".
(An origin server wishing to prevent HTTP/1.0 proxies from improperly
caching the response could also send both "Expires: <now>", to
prevent such caching, and "Cache-control: max-age=NNNN", to allow
newer proxies to cache the response).
The purpose of the Meter header is to obviate the need for
"Cache-control: proxy-mustcheck" within a metering subtree. Thus,
any proxy may negotiate the use of hit-metering and/or usage-limiting
with the next-hop server. If this server is the origin server, or is
already part of a metering subtree (rooted at the origin server),
then it may complete the negotiation, thereby extending the metering
subtree to include the new proxy.
To start the negotiation, a proxy sends its request with one of the
following Meter directives:
will-report-and-limit
indicates that the proxy is willing and able to
return usage reports and will obey any usage-limits.
wont-report indicates that the proxy will obey usage-limits but
will not send usage reports.
wont-limit indicates that the proxy will not obey usage-limits
but will send usage reports.
A proxy willing to neither obey usage-limits nor send usage reports
MUST NOT transmit a Meter header in the request.
By definition, an empty Meter header:
Meter:
is equivalent to "Meter: will-report-and-limit", and so, by the
Mogul, Leach [Page 10]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
definition of the Connection header (see section 14.10 of the
HTTP/1.1 specification [1]), a request that contains
Connection: Meter
and no explicit Meter header is equivalent to a request that contains
Connection: Meter
Meter: will-report-and-limit
This makes the default case more efficient.
These request directives ("will-report", "will-limit", and
"will-report-and-limit" in both its explicit and implicit forms)
apply to all subsequent requests made on the given transport
connection.
---------
Note: one way for a server to implement the ``connection-long''
nature of these three directives is to associate two flag bits
with each transport connection from a client, which are
initially cleared when the connection is established. Receipt
of the "will-report" or "will-limit" directive sets the
corresponding flag bit; receipt of the "will-report-and-limit"
or of an empty Meter request header sets both bits.
---------
An origin server that is not interested in metering or usage-limiting
the requested resource simply ignores the Meter header.
If the server wants the proxy to do hit-metering and/or
usage-limiting, its response should include one or more of the
following Meter directives:
For hit-metering:
do-report specifies that the proxy MUST send usage reports to
the server.
dont-report specifies that the proxy SHOULD NOT send usage
reports to the server.
By definition, an empty Meter header in a response, or any Meter
header that does not contain "dont-report", means "Meter: do-report";
this makes a common case more efficient.
For usage-limiting
max-uses=NNN sets an upper limit of NNN "uses" of the response,
not counting its immediate forwarding to the
requesting end-client, for all proxies in the
following subtree taken together.
Mogul, Leach [Page 11]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
max-reuses=NNN sets an upper limit of NNN "reuses" of the response
for all proxies in the following subtree taken
together.
When a proxy has exhausted its allocation of "uses" or "reuses" for a
cache entry, it MUST revalidate the cache entry (using a conditional
request) before returning it in a response. (The proxy SHOULD use
this revalidation message to send a usage report, if one was
requested and it is time to send it. See sections 3.4 and 3.5.)
These Meter response-directives apply only to the specific response
that they are attached to.
---------
Note that the limit on "uses" set by the max-uses directive
does not include the use of the response to satisfy the
end-client request that caused the proxy's request to the
server. This counting rule supports the notion of a
cache-initiated prefetch: a cache may issue a prefetch request,
receive a max-uses=0 response, store that response, and then
return that response (without revalidation) when a client makes
an actual request for the resource. However, each such
response may be used at most once in this way, so the origin
server maintains precise control over the number of actual
uses.
---------
A proxy receiving a Meter header in a response MUST either obey it,
or it MUST revalidate the corresponding cache entry on every access.
(I.e., if it chooses not to obey the Meter header in a response, it
MUST act as if the response included "Cache-control:
proxy-mustcheck".)
---------
Note: a proxy that has not sent the Meter header in a request
during the given transport connection, and which has therefore
not volunteered to honor Meter directives in a response, is not
required to honor them. If, in this situation, the server does
send a Meter header in a response, this is a protocol error.
However, based on the robustness principle, the proxy may
choose to interpret the Meter header as an implicit request to
include "Cache-control: proxy-mustcheck" when it forwards the
response, since this preserves the apparent intention of the
server.
---------
A proxy that receives the Meter header in a request may ignore it
only to the extent that this is consistent with its own duty to the
next-hop server. If the received Meter header is inconsistent, or no
Meter header is received and the next-hop server has requested any
metering or limiting, then the proxy MUST add "Cache-control:
Mogul, Leach [Page 12]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
proxy-mustcheck" to all responses it sends for the resource. (A
proxy SHOULD NOT add or change the Expires header or max-age
Cache-control directive.)
---------
For example, if proxy A receives a GET request from proxy B for
URL X with "Connection: Meter", but proxy A's cached response
for URL does not include any Meter directives, then proxy A may
ignore the metering offer from proxy B.
However, if proxy A has previously told the origin server
"Meter: wont-limit" (implying will-report), and the cached
response contains "Meter: do-report", and proxy B's request
includes "Meter: wont-report", then proxy B's offer is
inconsistent with proxy A's duty to the origin server.
Therefore, in this case proxy A must add "Cache-control:
proxy-mustcheck" when it returns the cached response to proxy
B, and must not include a Meter header in this response.
---------
If a server does not want to use the Meter mechanism, and will not
want to use it any time soon, it may send this directive:
wont-ask recommends that the proxy SHOULD NOT send any Meter
directives to this server.
The proxy SHOULD remember this fact for up to 24 hours. This avoids
virtually all unnecessary overheads for servers that do not wish to
use or support the Meter header. (This directive also implies
``dont-report''.)
3.4 Transmission of usage reports
To transmit a usage report, a proxy sends the following Meter header
in a request on the appropriate resource:
Meter: count=NNN/MMM
The first integer indicates the count of uses of the cache entry
since the last report; the second integer indicates the count of
reuses of the entry (see section 5.3 for rules on counting uses and
reuses). The transmission of a "count" directive in a request with
no other Meter directive is also defined as an implicit transmission
of a "will-report-and-limit" directive, to optimize the common case.
(A proxy not willing to honor usage-limits would send "Meter:
count=NNN/MMM, wont-limit" for its reports.)
Note that when a proxy forwards a client's request and receives a
response, the response that the proxy sends immediately to the
requesting client is not counted as a "use". I.e., the reported
count is the number of times the cache entry was used, and not the
number of times that the response was used.
Mogul, Leach [Page 13]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
A proxy SHOULD NOT transmit "Meter: count=0/0", since this conveys no
useful information.
Usage reports MUST always be transmitted as part of a conditional
request (such as a GET or HEAD), since the information in the
conditional header (e.g., If-Modified-Since or If-None-Match) is
required for the origin server to know which instance of a resource
is being counted. Proxys forwarding usage reports up the metering
subtree MUST NOT change the contents of the conditional header, since
otherwise this would result in incorrect counting.
A usage report MUST NOT be transmitted as part of a forwarded request
that includes multiple entity tags in an If-None-Match or If-Match
header.
---------
Note: a proxy that offers its willingness to do hit-metering
(report usage) must count both uses and reuses. It is not
possible to negotiate the reporting of one but not the other.
---------
3.5 When to send usage reports
A proxy that has offered to send usage reports to its parent in the
metering subtree MUST send a usage report in each of these
situations:
1. When it forwards a conditional GET on the resource
instance on behalf of one of its clients (if the GET is
conditional on at most one entity-tag).
2. When it forwards a conditional HEAD on the resource
instance on behalf of one of its clients.
3. When it must generate a conditional GET to satisfy a
client request because the max-uses limit has been
exceeded.
4. When it removes the corresponding non-zero hit-count entry
from its storage for any reason including:
- the proxy needs the storage space for another
hit-count entry.
- the proxy is not able to store more than one response
per resource, and a request forwarded on behalf of a
client has resulted in the receipt of a new response
(one with a different entity-tag or last-modified
time).
Note that a cache might continue to store hit-count
information even after having deleted the body of the
Mogul, Leach [Page 14]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
response, so it is not necessary to report the hit-count
when deleting the body; it is only necessary to report it
if the proxy is about to "forget" a non-zero value.
(Section 5.3 explains how hit-counts become zero or non-zero.)
If the usage report is being sent because the proxy is about to
remove the hit-count entry from its storage:
- The proxy MUST send the report as part of a conditional
HEAD request on the resource instance.
- The proxy is not required to retry the HEAD request if it
fails (this is a best-efforts design).
- The proxy is not required to serialize any other operation
on the completion of this request.
---------
Note: proxy implementors are strongly encouraged to batch
several HEAD-based reports to the same server, when possible,
over a single persistent connection, to reduce network overhead
as much as possible. This may involve a non-naive algorithm
for scheduling the deletion of hit-count entries.
---------
If the usage count is sent because of an arriving request that also
carries a "count" directive, the proxy MUST combine its own (possibly
zero) use and reuse counts with the arriving counts, and then attempt
to forward the request.
---------
Discussion point: a previous version of this design made the
final HEAD-based report optional for the proxy, and included a
way for the proxy to notify the server that it intended to
provide this report.
In this design, a proxy that offers its willingness to
hit-meter a resource must make the final HEAD-based report, if
the unreported count is non-zero; there is no option.
Doing so commits a hit-metering proxy to send a fraction of one
extra request per "cache entry and removal" cycle. This is not
exactly one request, because it is possible that the stored
count is zero and does not need to be reported. (Trace-based
studies should be done to estimate the actual fraction.) We
believe that this cost is minimal except for proxies whose
network connection is severely bandwidth-limited, and since
origin servers may not be willing to allow proxy caching except
when the final hit-count report is provided, even
bandwidth-limited proxies may come out ahead by offering to
hit-meter.
Mogul, Leach [Page 15]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
However, it is feasible to change this protocol design to allow
a proxy to offer to hit-meter without committing to send a
final HEAD-based report. This would involve the addition of
two more Meter directives, "wont-final-report" and
"dont-final-report". An origin server receiving a "Meter:
wont-final-report" may, at its option, either reply with
"Meter: dont-final-report" and allow the proxy to cache the
response, or with a "Cache-control: proxy-mustcheck" (if it
wants fully accurate hit counts). If the protocol is amended
to include this feature, proxy administrators would need to
choose between the small extra overhead of doing this final
HEAD, and the possibly much larger cost of not being permitted
to cache certain resources at all.
We do not believe that this option is likely to result in
improved performance, but we are willing to include it in the
specification if strong arguments are made in its favor.
---------
---------
Discussion point: one reviewer suggested that it would be
useful, in some cases, for the origin server to be able to
insist on receiving a final HEAD-based report even when the
use-count and reuse-count are both zero. The justification for
this proposal was that it would resolve the ambiguity between a
cache that has removed the entry because it has not incremented
the counters, and one that has non-zero counters for the entry
but which has not yet removed it (perhaps because a lot of
storage is available and/or no usage-limit has been reached).
We are considering adding another Meter response-directive that
would allow the origin server to specify either a boolean flag
(``you should send a final usage-report even if the counts are
both zero'') or a timeout (``you should send a usage-report
when the cache entry reaches age N''). It is not clear,
however, if the benefits would offset the additional
complexity; more comment is invited.
---------
3.6 Subdivision of usage-limits
When an origin server specifies a usage limit, a proxy in the
metering subtree may subdivide this limit among its children in the
subtree as it sees fit.
For example, consider the situation with two proxies P1 and P2, each
of which uses proxy P3 as a way to reach origin server S. Imagine
that S sends P3 a response with
Meter: max-uses=10
The proxies use that response to satisfy the current requesting
Mogul, Leach [Page 16]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
end-client. The max-uses directive in this example allows the
combination of P1, P2, and P3 together to satisfy 10 additional
end-client uses (unconditional GETs) for the resource.
This specification does not constrain how P3 divides up that
allocation among itself and the other proxies. For example, P3 could
retain all of max-use allocation for itself. In that case, it would
forward the response to P1 and/or P2 with
Meter: max-uses=0
P3 might also divide the allocation equally among P1 and P2,
retaining none for itself (which may be the right choice if P3 has
few or no other clients). In this case, it could send
Meter: max-uses=5
to the proxy (P1 or P2) that made the initial request, and then
record in some internal data structure that it "owes" the other proxy
the rest of the allocation.
Note that this freedom to choose the max-uses value applies to the
origin server, as well. There is no requirement that an origin
server send the same max-uses value to all caches. For example, it
might make sense to send "max-uses=2" the first time one hears from a
cache, and then double the value (up to some maximum limit) each time
one gets a "use-count" from that cache. The idea is that the faster
a cache is using up its max-use quota, the more likely it will be to
report a use-count value before removing the cache entry. Also, high
and frequent use-counts imply a corresponding high efficiency benefit
from allowing caching.
Again, the details of such heuristics would be outside the scope of
this specification.
4 Analysis
We recognize that, for many service operators, the single most
important aspect of the request stream is the number of distinct
users who have retrieved a particular entity. We believe that our
design provides adequate support for user-counting, based on the
following analysis.
We start with the observation that almost all Web users have client
software that maintains local caches, and that the state of the art
of local-caching technology is quite effective. Therefore, to a first
approximation, each individual user who retrieves an entity does
exactly one GET request that results in a 200 or 203 response, or a
206 response that includes the first byte of the entity. If a proxy
cache maintains an accurate use-count of such retrievals, then its
Mogul, Leach [Page 17]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
use-count will approximate the number of distinct users who have
retrieved the entity.
There are some circumstances under which this approximation can break
down. For example, if an entity stays in a proxy cache for much
longer than it persists in the typical client cache, and users often
re-reference the entity, then this scheme will tend to over-count the
number of users. Or, if the cache-management policy implemented in
typical client caches is biased against retaining certain kinds of
frequently re-referenced entities (such as very large images), the
use-counts reported will tend to overestimate the user-counts for
such entities. For the most part, however, we do not believe this
will be a source of significant error.
We also note that the existing "cache-busting" mechanisms for
counting distinct users will certainly overestimate the number of
users behind a proxy, since it provides no reliable way to
distinguish between a user's initial request and subsequent repeat
requests caused by insufficient space in the end-client cache.
4.1 What about "Network Computers"?
Our analysis assumes that "almost all Web users" have client caches.
If the Network Computers (NC) model becomes popular, however, then
this assumption may be faulty: most proposed NCs have no disk
storage, and relatively little RAM. Such systems may do little or no
caching of HTTP responses. This means that a single user might well
generate many unconditional GETs that yield the same response from a
proxy cache.
We first note that the hit-metering design in this document operates
correctly, even with such clients: the counts that a proxy would
return to an origin server would represent exactly the number of
requests that the proxy would forward to the server, if the server
simply specifies "Cache-control: proxy-mustcheck".
However, it may be possible to improve the accuracy of these
hit-counts by use of some heuristics at the proxy. For example, the
proxy might note the IP address of the client, and count only one GET
per client address per response. This is not perfect: for example,
it fails to distinguish between NCs and certain other kinds of hosts.
The proxy might also use the heuristic that only those clients that
never send a conditional GET should be treated this way, although we
are not at all certain that NCs will never send conditional GETs.
Since the solution to this problem appears to require heuristics
based on the actual behavior of NCs (or perhaps a new HTTP protocol
feature that allows unambiguous detection of cacheless clients), we
believe it is premature to specify a solution.
Mogul, Leach [Page 18]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
4.2 Why max-uses is not a Cache-control directive
Our first proposal was that the "max-uses" directive should be
carried by the Cache-control header, since it is superficially
similar to the "max-age" Cache-control directive. However, we
believe that the HTTP community will not accept a specification that
makes the implementation of "max-uses" mandatory for proxy caches,
and in any event we could not force older implementations to honor
it.
Because the Cache-control mechanism has no means for a proxy to
explicitly promise to honor "max-uses", it would not be possible (in
general) for a server to depend on such a Cache-control header. The
"metering subtree" mechanism implemented by the Meter header,
however, does allow a server to rely on a precise interpretation of
"max-uses," when used as a Meter directive.
5 Specification
5.1 Specification of Meter header and directives
The Meter general-header field is used to:
- Negotiate the use of hit-metering and usage-limiting among
origin servers and proxy caches.
- Report use counts and reuse counts.
Implementation of the Meter header is optional for both proxies and
origin servers. However, any proxy that transmits the Meter header
in a request MUST implement every requirement of this specification,
without exception or amendment.
The Meter header MUST always be protected by a Connection header. A
proxy that does not implement the Meter header MUST NOT pass it
through to another system (see section 5.5 for how a non-caching
proxy may comply with this specification). If a Meter header is
received in a message whose version is less than HTTP/1.1, it MUST be
ignored (because it has clearly flowed through a proxy that does not
implement Meter).
A proxy that has received a response with a version less than
HTTP/1.1, and therefore from a server (or another proxy) that does
not implement the Meter header, SHOULD NOT send Meter request
directives to that server, because these would simply waste
bandwidth. This recommendation does not apply if the proxy is
currently hit-metering or usage-limiting any responses from that
server. If the proxy receives a HTTP/1.1 or higher response from
such a server, it should cease its suppression of the Meter
directives.
Mogul, Leach [Page 19]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
All proxies sending the Meter header MUST adhere to the "metering
subtree" design described in section 3.
Meter = "Meter" ":" 0#meter-directive
meter-directive = meter-request-directive
| meter-response-directive
| meter-report-directive
meter-request-directive =
"will-report-and-limit"
| "wont-report"
| "wont-limit"
meter-report-directive =
| "count" "=" 1*DIGIT "/" 1*DIGIT
meter-response-directive =
"max-uses" "=" 1*DIGIT
| "max-reuses" "=" 1*DIGIT
| "do-report"
| "dont-report"
| "wont-ask"
A meter-request-directive or meter-report-directive may only appear
in an HTTP request message. A meter-response-directive may only
appear in an HTTP response directive.
A meter-request-directive applies to all subsequent requests made on
the given transport connection. All other Meter directives apply
only to the specific request or response that they are attached to.
An empty Meter header in a request means "Meter:
will-report-and-limit" (and so applies to all subsequent requests on
the given transport connection). An empty Meter header in a
response, or any other response including one or more Meter headers
without the "dont-report" or "wont-ask" directive, implies "Meter:
do-report".
The meaning of the meter-request-directives are as follows:
will-report-and-limit
indicates that the proxy is willing and able to
return usage reports and will obey any usage-limits.
wont-report indicates that the proxy will obey usage-limits but
will not send usage reports.
wont-limit indicates that the proxy will not obey usage-limits
but will send usage reports.
Mogul, Leach [Page 20]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
A proxy willing neither to obey usage-limits nor to send usage
reports MUST NOT transmit a Meter header in the request.
The meaning of the meter-report-directives are as follows:
count "=" 1*DIGIT "/" 1*DIGIT
Both digit strings encode decimal integers. The
first integer indicates the count of uses of the
cache entry since the last report; the second integer
indicates the count of reuses of the entry.
Section 5.3 specifies the counting rules.
The meaning of the meter-response-directives are as follows:
max-uses "=" 1*DIGIT
sets an upper limit on the number of "uses" of the
response, not counting its immediate forwarding to
the requesting end-client, for all proxies in the
following subtree taken together.
max-reuses "=" 1*DIGIT
sets an upper limit on the number of "reuses" of the
response for all proxies in the following subtree
taken together.
do-report specifies that the proxy MUST send usage reports to
the server.
dont-report specifies that the proxy SHOULD NOT send usage
reports to the server.
wont-ask specifies that the proxy SHOULD NOT send any Meter
headers to the server. The proxy should forget this
advice after a period of no more than 24 hours.
Section 5.3 specifies the counting rules, and in particular specifies
a somewhat non-obvious interpretation of the max-uses value.
5.2 Abbreviations for Meter directives
To allow for the most efficient possible encoding of Meter headers,
we define abbreviated forms of all Meter directives. These are
exactly semantically equivalent to their non-abbreviated
counterparts. All systems implementing the Meter header MUST
implement both the abbreviated and non-abbreviated forms.
Implementations SHOULD use the abbreviated forms in normal use.
The abbreviated forms of Meter directive are shown below, with the
corresponding non-abbreviated literals in the comments:
Mogul, Leach [Page 21]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
Abb-Meter = "Meter" ":" 0#abb-meter-directive
abb-meter-directive = abb-meter-request-directive
| abb-meter-response-directive
| abb-meter-report-directive
abb-meter-request-directive =
"w" ; "will-report-and-limit"
| "x" ; "wont-report"
| "y" ; "wont-limit"
abb-meter-report-directive =
| "c" "=" 1*DIGIT "/" 1*DIGIT ; "count"
abb-meter-response-directive =
"u" "=" 1*DIGIT ; "max-uses"
| "r" "=" 1*DIGIT ; "max-reuses"
| "d" ; "do-report"
| "e" ; "dont-report"
| "n" ; "wont-ask"
---------
Note: although the Abb-Meter BNF rule is defined separately
from the Meter rule, one may freely mix abbreviated and
non-abbreviated Meter directives in the same header.
---------
5.3 Counting rules
---------
Note: please remember that hit-counts and usage-counts are
associated with individual responses, not with resources. A
cache entry that, over its lifetime, holds more than one
response is also not a "response", in this particular sense.
---------
Let R be a cached response, and V be the value of the Request-URI and
selecting request-headers (if any, see section 14.43 of the HTTP/1.1
specification [1]) that would select R if contained in a request. We
define a "use" of R as occurring when the proxy returns its stored
copy of R in a response with any of the following status codes: a 200
(OK) status; a 203 (Non-Authoritative Information) status; or a 206
(Partial Content) status when the response contains byte #0 of the
entity (see section 5.4 for a discussion of Range requests).
---------
Note: when a proxy forwards a client's request and receives a
response, the response that the proxy sends immediately to the
requesting client is not counted as a "use". I.e., the
reported count is the number of times the cache entry was used,
and not the number of times that the response was used.
Mogul, Leach [Page 22]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
---------
We define a "reuse" of R as as occurring when the proxy responds to a
request selecting R with a 304 (Not Modified) status, unless that
request is a Range request that does not specify byte #0 of the
entity.
5.3.1 Counting rules for hit-metering
A proxy participating in hit-metering for a cache response R
maintains two counters, CU and CR, associated with R. When a proxy
first stores R in its cache, it sets both CU and CR to 0 (zero).
When a subsequent client request results in a "use" of R, the proxy
increments CU. When a subsequent client request results in a "reuse"
of R, the proxy increments CR. When a subsequent client request
selecting R (i.e., including V) includes a "count" Meter directive,
the proxy increments CU and CR using the corresponding values in the
directive.
When the proxy sends a request selecting R (i.e., including V) to the
inbound server, it includes a "count" Meter directive with the
current CU and CR as the parameter values. If this request was
caused by the proxy's receipt of a request from a client, upon
receipt of the server's response, the proxy sets CU and CR to the
number of uses and reuses, respectively, that may have occurred while
the request was in progress. (These numbers are likely, but not
certain, to be zero.) If the proxy's request was a final HEAD-based
report, it need no longer maintain the CU and CR values, but it may
also set them to the number of intervening uses and reuses and retain
them.
5.3.2 Counting rules for usage-limiting
A proxy participating in usage-limiting for a response R maintains
either or both of two counters TU and TR, as appropriate, for that
resource. TU and TR are incremented in just the same way as CU and
CR, respectively. However, TU is zeroed only upon receipt of a
"max-uses" Meter directive for that response (including the initial
receipt). Similarly, TR is zeroed only upon receipt of a
"max-reuses" Meter directive for that response.
A proxy participating in usage-limiting for a response R also stores
values MU and/or MR associated with R. When it receives a response
including only a max-uses value, it sets MU to that value and MR to
infinity. When it receives a response including only a max-reuses
value, it sets MR to that value and MU to infinity. When it receives
a response including both max-reuses and max-reuses values, it sets
MU and MR to those values, respectively. When it receives a
subsequent response including neither max-reuses nor max-reuses
values, it sets both MU and MR to infinity.
If a proxy participating in usage-limiting for a response R receives
a request that would cause a "use" of R, and TU >= MU, it MUST
Mogul, Leach [Page 23]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
forward the request to the server. If it receives a request that
would cause a "reuse" of R, and TR >= MR, it MUST forward the request
to the server. If (in either case) the proxy has already forwarded a
previous request to the server and is waiting for the response, it
should delay further handling of the new request until the response
arrives (or times out); it SHOULD NOT have two revalidation requests
pending at once that select the same response, unless these are Range
requests selecting different subranges.
There is a special case of this rule for the "max-uses" directive: if
the proxy receives a response with "max-uses=0" and does not forward
it to a requesting client, the proxy should set a flag PF associated
with R. If R is true, then when a request arrives while if TU >= MU,
if the PF flag is set, then the request need not be forwarded to the
server (provided that this is not required by other caching rules).
However, the PF flag MUST be cleared on any use of the response.
---------
Note: the "PF" flag is so named because this feature is useful
only for caches that could issue a "prefetch" request before an
actual client request for the response. A proxy not
implementing prefetching need not implement the PF flag.
---------
5.3.3 Equivalent algorithms are allowed
Any other algorithm that exhibits the same external behavior (i.e.,
generates exactly the same requests from the proxy to the server) as
the one in this section is explicitly allowed.
---------
Note: in most cases, TU will be equal to CU, and TR will be
equal to CR. The only two cases where they could differ are:
1. The proxy issues a non-conditional request for the
resource using V, while TU and/or TR are non-zero, and
the server's response includes a new "max-uses" and/or
"max-reuses" directive (thus zeroing TU and/or TR, but
not CU and CR).
2. The proxy issues a conditional request reporting the
hit-counts (and thus zeroing CU and CR, but not TU or
TR), but the server's response does not include a new
"max-uses" and/or "max-reuses" directive.
To solve the first case, the proxy has several implementation
options
- Always store TU and TR separately from CU and CR.
- Create "shadow" copies of TU and TR when this situation
arises (analogous to "copy on write").
Mogul, Leach [Page 24]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
- Generate a HEAD-based usage report when the
non-conditional request is sent (or when the
"max-uses=0" is received), causing CU and CR to be
zeroed (analogous in some ways to a "memory barrier"
instruction).
In the second case, the server implicitly has removed the
usage-limit(s) on the response (by setting MU and/or MR to
infinity), and so the fact that, say, TU is different from CU
is not significant.
---------
---------
Note: It may also be possible to eliminate the PF flag by
sending extra HEAD-based usage-report requests, but we
recommend against this; it is better to allocate an extra bit
per entry than to transmit extra requests.
---------
5.4 Counting rules: interaction with Range requests
HTTP/1.1 allows a client to request sub-ranges of a resource. A
client might end up issuing several requests with the net effect of
receiving one copy of the resource. We need to establish a rule for
counting these references, although it is not clear that one rule
generates accurate results in every case.
The rule established in this specification is that proxies count as a
"use" or "reuse" only those Range requests that result in the return
of byte #0 of the resource. The rationale for this rule is that in
almost every case, an end-client will retrieve the beginning of any
resource that it references at all, and that it will seldom retrieve
any portion more than once. Therefore, this rule appears to meet our
goal of a "best-efforts" approach to accuracy.
5.5 Implementation by non-caching proxies
A non-caching proxy may participate in the metering subtree; in fact,
we strongly recommend this.
A non-caching proxy (HTTP/1.1 or higher) that participates in the
metering subtree SHOULD forward Meter headers on both requests and
responses, with the appropriate Connection headers.
If a non-caching proxy forwards Meter headers, it MUST comply with
these restrictions:
1. If the proxy forwards Meter headers in requests, it MUST
NOT reorder the requests from a given client to a given
server.
2. If the proxy forwards Meter headers in requests, it must
not do so while merging requests from multiple incoming
Mogul, Leach [Page 25]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
connections (i.e., connections from one or more of its
clients) onto one outgoing connection.
3. If the proxy forwards Meter headers in requests, if its
connection to the server is closed or aborted, then it
should close or abort the corresponding connection to the
client. (Alternatively, the proxy may act in any way that
precisely maintains the ``connection-long'' nature of the
meter-request-directives.)
4. If the proxy forwards Meter headers in responses, such a
response MUST NOT be returned to any request except the
one that elicited it.
5. Once a non-caching proxy starts forwarding Meter headers,
it should not arbitrarily stop forwarding them (or else
reports may be lost).
---------
Note: the intent of these restrictions is to make the
non-caching proxy appear functionally transparent with respect
to the Meter header. It may be possible to relax these
restrictions after a more careful analysis, while still meeting
this intent. We do not believe that these restrictions will
add much complexity to a straightforward implementation of a
non-caching HTTP proxy.
---------
A proxy that caches some responses and not others, for whatever
reason, may choose to implement the Meter header as a caching proxy
for the responses that it caches, and as a non-caching proxy for the
responses that it does not cache, as long as its external behavior
with respect to any particularly response is fully consistent with
this specification.
6 Expressing or approximating the "proxy-mustcheck" directive
As we pointed out in section 1.2, this hit-metering design depends on
HTTP/1.1 support for a way for an origin server to say "proxies must
revalidate this response even if fresh." The existing HTTP/1.1
specification does not provide exactly such a mechanism. In this
section, we discuss the alternatives for resolving this problem.
---------
Note: much of the discussion in this section is better covered
by the proposal made in [2], which proposes a new
``proxy-maxage'' Cache-control directive. If that proposal is
adopted, this document would be modified by removing this
section, and replacing ``proxy-mustcheck'' everywhere else with
``proxy-maxage=0''.
---------
Mogul, Leach [Page 26]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
One possibility would simply be to modify the HTTP/1.1 specification
to include a Cache-control cache-response-directive with precisely
the required semantics. The meaning of the "proxy-mustcheck"
directive would be identical to "proxy-revalidate", except that it
would require revalidation whether or not the entry is fresh.
In order for this approach to be reliable, it would have to be
supported by all HTTP/1.1-compliant proxies. This means that the
specification change would have to be adopted quickly, before any
significant operational deployment of HTTP/1.1 proxies.
If it is not feasible to modify the specification HTTP/1.1, there are
several ways to approximate a "proxy-mustcheck" directive using the
existing specification:
- Use of "Cache-control: private": because this prevents
shared caches from storing the response, it has the effect
that it forces as many requests as proxy-mustcheck" would,
and so the origin server will receive accurate counts.
However, because "private" prevents a shared cache from
even storing the response, it cannot do a conditional
request for subsequent references, and hence this approach
would lead to unnecessary transmission of entity bodies
(instead of 304 Not Modified responses).
- Use of "Cache-control: proxy-revalidate, max-age=0": this
allows proxies to store the response and forces them to
revalidate it on every reference. However, it also implies
that end-user caches should revalidate on every reference
as well, which is not necessary for most hit-metering
applications (see section 4).
Because of the way Cache-control is specified, it would also be
possible to phase in the use of a new "proxy-mustcheck" directive
without compromising counting accuracy in the interim, by using
"Cache-control: private, proxy-mustcheck". (This means that the
specification of "proxy-mustcheck" would explicitly have it override
"private".) The risk of taking this phased approach is that, until
most proxies support "proxy-mustcheck", a lot of unnecessary
full-body responses would be sent.
7 Examples
7.1 Example of a complete set of exchanges
This example shows how the protocol is intended to be used most of
the time: for hit-metering without usage-limiting. Entity bodies are
omitted.
A client sends request to a proxy:
Mogul, Leach [Page 27]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
GET http://foo.com/bar.html HTTP/1.1
The proxy forwards request to the origin server:
GET /bar.html HTTP/1.1
Host: foo.com
Connection: Meter
thus offering (implicitly) "will-report-and-limit".
The server responds to the proxy:
HTTP/1.1 200 OK
Cache-control: max-age=3600
Connection: meter
Etag: "abcde"
thus (implicitly) requiring "do-report" (but not requiring
usage-limiting).
The proxy responds to the client:
HTTP/1.1 200 OK
Etag: "abcde"
Cache-control: max-age=3600, proxy-mustcheck
Age: 1
since the proxy does not know if its client is an end-system, or a
proxy that doesn't do metering, it adds the "proxy-mustcheck"
directive.
Another client soon asks for the resource:
GET http://foo.com/bar.html HTTP/1.1
and the proxy sends the same response as it sent to the other client,
except (perhaps) for the Age value.
After an hour has passed, a third client asks for the response:
GET http://foo.com/bar.html HTTP/1.1
But now the response's max-age has been exceeded, so the proxy
revalidates the response with the origin server:
GET /bar.html HTTP/1.1
If-None-Match: "abcde"
Host: foo.com
Connection: Meter
Meter: count=1/0
Mogul, Leach [Page 28]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
thus simultaneously fulfilling its duties to validate the response
and to report the one "use" that wasn't forwarded.
The origin server responds:
HTTP/1.1 304 Not Modified
Cache-control: max-age=3600
Etag: "abcde"
so the proxy can use the original response to reply to the new
client; the proxy also zeros the use-count it associates with that
response.
Another client soon asks for the resource:
GET http://foo.com/bar.html HTTP/1.1
and the proxy sends the appropriate response.
After another few hours, the proxy decides to remove the cache entry.
When it does so, it sends to the origin server:
HEAD /bar.html HTTP/1.1
If-None-Match: "abcde"
Host: foo.com
Connection: Meter
Meter: count=1/0
reporting that one more use of the response was satisfied from the
cache.
7.2 Protecting against HTTP/1.0 proxies
An origin server that does not want HTTP/1.0 caches to store the
response at all, and is willing to have HTTP/1.0 end-system clients
generate excess GETs (which will be handled by the proxy, of course)
could send this for its reply:
HTTP/1.1 200 OK
Cache-control: max-age=3600
Connection: meter
Etag: "abcde"
Expires: Sun, 06 Nov 1994 08:49:37 GMT
HTTP/1.0 caches will see the ancient Expires header, but HTTP/1.1
caches will see the max-age directive and will ignore Expires.
7.3 More elaborate examples
Here is a request from a proxy that is willing to hit-meter but is
not willing to usage-limit:
Mogul, Leach [Page 29]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
GET /bar.html HTTP/1.1
Host: foo.com
Connection: Meter
Meter: wont-limit
Here is a response from an origin server that does not want hit
counting, but does want "uses" limited to 3, and "reuses" limited to
6:
HTTP/1.1 200 OK
Cache-control: max-age=3600
Connection: meter
Etag: "abcde"
Expires: Sun, 06 Nov 1994 08:49:37 GMT
Meter: max-uses=3, max-reuses=6, dont-report
Here is the same example with abbreviated Meter directive names:
HTTP/1.1 200 OK
Cache-control: max-age=3600
Connection: meter
Etag: "abcde"
Expires: Sun, 06 Nov 1994 08:49:37 GMT
Meter:u=3,r=6,e
8 Interactions with varying resources
Separate counts should be kept for each combination of the headers
named in the Vary header for the Request-URI (what [1] calls "the
selecting request-headers"), even if they map to the same entity-tag.
This rule has the effect of counting hits on each variant, if there
are multiple variants of a page available.
---------
Note: This interaction between Vary and the hit-counting
directives allows the origin server a lot of flexibility in
specifying how hits should be counted. In essence, the origin
server uses the Vary mechanism to divide the requests for a
resource into arbitrary categories, based on the request-
headers. (We will call these categories "request-patterns".)
Since a proxy keeps its hit-counts for each request-pattern,
rather than for each resource, the origin server can obtain
separate statistics for many aspects of an HTTP request.
---------
For example, if a page varied based on the value of the User-Agent
header in the requests, then hit counts would be kept for each
different flavor of browser. But it is in fact more general than
that; because multiple header combinations can map to the same
variant, it also enables the origin server to count the number of
Mogul, Leach [Page 30]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
times (e.g.) the Swahili version of a page was requested, even though
it is only available in English.
If a proxy does not support the Vary mechanism, then [1] says that it
MUST NOT cache any response that carries a Vary header, and hence
need not implement any aspect of this hit-counting or usage-limiting
design for varying resources.
---------
Note: this also implies that if a proxy supports the Vary
mechanism but is not willing to maintain independent hit-counts
for each variant response in its cache, then it must follow at
least one of these rules:
1. It must not use the Meter header in a request to offer
to hit-meter or usage-limit responses.
2. If it does offer to hit-meter or usage-limit responses,
and then receives a response that includes both a Vary
header and a Meter header with a directive that it
cannot satisfy, then the proxy must not cache the
response.
In other words, a proxy is allowed to partially implement the
Vary mechanism with respect to hit-metering, as long as this
has no externally visible effect on its ability to comply with
the Meter specification.
---------
This approach works for counting almost any aspect of the request
stream, without embedding any specific list of countable aspects in
the specification or proxy implementation.
9 A Note on Capturing Referrals
It is alleged that some advertisers want to pay content providers,
not by the "hit", but by the "nibble" -- the number of people who
actually click on the ad to get more information.
Now, HTTP already has a mechanism for doing this: the "Referer"
header. However, perhaps it ought to be disabled for privacy reasons
-- according the HTTP/1.1 spec:
"Because the source of the link may be private information
or may reveal an otherwise private information source, it is
strongly recommended that the user be able to select whether
or not the Referer field is sent."
However, in the case of ads, the source of the link actually wants to
let the referred-to page know where the reference came from.
Mogul, Leach [Page 31]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
This does not require the addition of any extra mechanism, but rather
can use schemes that embed the referrer in the URI in a manner
similar to this:
http://www.blah.com/ad-reference?from=site1
Such a URI should point to a resource (perhaps a CGI script) which
returns a 302 redirect to the real page
http://www.blah.com/ad-reference.html
Proxies which do not cache 302s will cause one hit on the redirection
page per use, but the real page will get cached. Proxies which do
cache 302s and report hits on the cached 302s will behave optimally.
This approach has the advantage that it works whether or not the
end-client has disabled the use of Referer.
10 Security Considerations
Which outbound clients should a server (proxy or origin) trust to
report hit counts? A malicious proxy could easily report a large
number of hits on some page, and thus perhaps cause a large payment
to a content provider from an advertiser. To help avoid this
possibility, a proxy may choose to only relay usage counts received
from its outbound proxies to its inbound servers when the proxies
have authenticated themselves using Proxy-Authorization and/or they
are on a list of approved proxies.
We do not see a way to enforce usage limits if a proxy is willing to
cheat.
Regarding privacy: we believe that the design in this document does
not reveal any more information about individual users than would
already be revealed by implementation of the existing HTTP/1.1
support for "Cache-control: max-age=0, proxy-revalidate". It may, in
fact, help to conceal certain aspects of the organizational structure
on the outbound side of a proxy.
11 Revision history
Minor clarifications, and grammar and spelling corrections, are not
listed here.
11.1 draft-mogul-http-hit-metering-01.txt
Clarified goals, non-goals, and limitations (section 1.1).
Removed the term ``sticky'' from the specification of
meter-request-directive; added an implementation note (section 3.3).
Mogul, Leach [Page 32]
Internet-Draft Hit-Metering for HTTP (DRAFT) 21 January 1997 12:06
Clarifications and corrections concerning the use of the Connection
header (section 3.1).
Added support for non-caching proxies (section 5.5).
Modified discussion of the Referer header (section 9).
Added the "wont-ask" directive (sections 3.3 and 5.1).
Replaced the use of "proxy-revalidate" with the (placeholder)
directive-name "proxy-mustcheck", and added a discussion of the
alternatives for making this real (section 6).
11.2 draft-mogul-http-hit-metering-00.txt
Initial revision.
12 Acknowledgements
We gratefully acknowledge the constructive comments received from
Anselm Baird-Smith, Koen Holtman (who suggested the technique
described in section 9), Dave Kristol, Ari Luotonen, Patrick
R. McManus, and Ingrid Melve.
13 References
1. Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk
Nielsen, and Tim Berners-Lee. Hypertext Transfer Protocol --
HTTP/1.1. RFC 2068, HTTP Working Group, January, 1997.
2. J. Mogul. Forcing HTTP/1.1 proxies to revalidate responses.
Internet Draft draft-mogul-http-revalidate-00.txt, HTTP Working
Group, January, 1997. This is a work in progress.
14 Authors' addresses
Jeffrey C. Mogul
Western Research Laboratory
Digital Equipment Corporation
250 University Avenue
Palo Alto, California, 94305, U.S.A.
Email: mogul@wrl.dec.com
Phone: 1 415 617 3304 (email preferred)
Paul J. Leach
Microsoft
1 Microsoft Way
Redmond, Washington, 98052, U.S.A.
Email: paulle@microsoft.com
Mogul, Leach [Page 33]
| PAFTECH AB 2003-2026 | 2026-04-24 16:06:34 |