One document matched: draft-lee-sdch-spec-00.xml


<?xml version="1.0" encoding="UTF-8"?>
  <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
  <!-- generated by https://github.com/cabo/kramdown-rfc2629 version 1.0.37 -->

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
]>


<rfc ipr="trust200902" docName="draft-lee-sdch-spec-00" category="info">

  <front>
    <title abbrev="sdch-spec">A Proposal for Shared Dictionary Compression over HTTP</title>

    <author initials="J." surname="Butler" fullname="Jon Butler">
      <organization></organization>
      <address>
        <email>jkbutler@google.com</email>
      </address>
    </author>
    <author initials="W.H." surname="Lee" fullname="Wei-Hsin Lee">
      <organization></organization>
      <address>
        <email>weihsinl@google.com</email>
      </address>
    </author>
    <author initials="B." surname="McQuade" fullname="Bryan McQuade">
      <organization></organization>
      <address>
        <email>mcquade@google.com</email>
      </address>
    </author>
    <author initials="K." surname="Mixter" fullname="Kenneth Mixter">
      <organization></organization>
      <address>
        <email>kmixter@google.com</email>
      </address>
    </author>

    <date year="2016" month="October" day="28"/>

    <area>Art</area>
    
    <keyword>Internet-Draft</keyword>

    <abstract>


<t>This paper proposes an HTTP/1.1-compatible extension that supports
inter-response data compression by means of a reference dictionary
shared between user agent and server.</t>



    </abstract>


  </front>

  <middle>


<section anchor="introduction" title="Introduction">

<t>In order to reduce payload size, HTTP/1.1 supports response
compression via the Accept-Encoding and Content-Encoding headers. The
most commonly used HTTP response compression encoding is gzip, which
compresses data that is repeated within a given response. However,
HTTP/1.1 does not provide a mechanism for compressing data that is
repeated between responses. A different class of encoding technique,
known as delta encoding, has proven effective at compressing
inter-response data.</t>

<t>Previous efforts to extend HTTP/1.1 to support delta compression have
focused on encoding an HTTP response as a delta of a previous version
of that response. One such approach is discussed in RFC 3229 “Delta
encoding in HTTP” <xref target="RFC3229"/>. While RFC 3229 is effective at reducing
payload size for many types of resources, it may not be suitable for
certain classes of responses.</t>

<t>Specifically, under RFC 3229, deltas can only be applied to responses
originating from the same URL, and the means of identifying the
instance to delta “from” is by a Last-Modified timestamp or
entity-tag. This makes RFC 3229 unsuitable for compressing dynamically
generated responses to a given URL with varying query parameters
(e.g. a search results page), since these types of responses are
difficult to identify uniquely using entity tags or last modified
timestamps. Content hashes can be used, but false positives are
possible. Also, storing all previous responses on the server may not
be practical.</t>

</section>
<section anchor="proposal-shared-dictionary-compression-over-http" title="Proposal: Shared Dictionary Compression over HTTP">

<t>Existing techniques compress each response in isolation, and so cannot
take advantage of cross-payload redundancy. For example, retrieving a
set of HTML pages with the same header, footer, inlined JavaScript and
CSS requires the retransmission of the same data multiple times. This
paper proposes a compression technique that leverages this
cross-payload redundancy.</t>

<t>In this proposal, a dictionary is a file downloaded by the user agent
from the server that contains strings which are likely to appear in
subsequent HTTP responses. In the case described above, if the header,
footer, JavaScript and CSS are stored in a dictionary possessed by
both user agent and server, the server can substitute these elements
with references to the dictionary, and the user agent can reconstruct
the original page from these references. By substituting dictionary
references for repeated elements in HTTP responses, the payload size
can be reduced.</t>

<t>If either the user agent or the server does not support the extension,
then ordinary HTTP responses are served.</t>

<t>If both the user agent and the server support the extension but the
user agent does not have an applicable dictionary (as described in
detail below), the server responds with an ordinary HTTP response that
includes a header advertising the location of a relevant
dictionary. This dictionary can be retrieved out-of-band by the user
agent.</t>

<t>If both the user agent and the server support the extension and the
user agent has an applicable dictionary, then each HTTP response
includes references to strings in the dictionary, rather than
repeating those strings in the response. The references require fewer
bytes to encode than the strings themselves, reducing the payload
size.</t>

<t>The HTTP header-based protocol for negotiating the presence of
dictionaries on user agent and server is referred to in this proposal
as the SDCH protocol. The compression scheme based on a particular
dictionary shared between user agent and server is referred to as the
SDCH encoding, and is built upon the VCDIFF compression data format
<xref target="RFC3284"/>.</t>

</section>
<section anchor="syntax" title="Syntax">

<t>The grammar descriptions in the sections that follow depend on the
following syntax: DIGIT (decimal digit), BASE64URLDIGIT (alphanumeric digit 
or “-“ or “_”), PAYLOADBYTE (a byte), token (informally, a
sequence of non-special, non-white space characters), rest-of-line
(informally, a sequence of characters not including carriage return or
line-feed). In the grammar below, HTTP_url, abs_path, and query are
defined in RFC 7230 <xref target="RFC7230"/>.</t>

<t>header = attr “:” value “\n” <vspace />
attr = token <vspace />
value = rest-of-line <vspace />
dictionary-client-id = 1*BASE64URLDIGIT <vspace />
dictionary-server-id = 1*BASE64URLDIGIT <vspace />
payload = 1*PAYLOADBYTE <vspace />
vcdiff-payload = 1*PAYLOADBYTE <vspace />
partial-url = HTTP_url | abs_path [ “?” query ]</t>

<t>The attribute names (attr) are case-insensitive. White space is
permitted between tokens.</t>

</section>
<section anchor="dictionary-description" title="Dictionary Description">

<section anchor="general" title="General">

<t>In the proposed protocol, a dictionary can only be used with a limited
set of URLs and for a limited duration of time, referred to as its
scope and lifetime, respectively. A dictionary is composed of the data
used by the compression algorithm, known as the payload, as well as
metadata describing its scope and lifetime. The scope is specified by
a domain attribute and path attribute that are patterned after the
same named attributes from the HTTP State Management Specification
<xref target="RFC2965"/>.</t>

</section>
<section anchor="syntax-of-dictionary-metadata" title="Syntax of Dictionary Metadata">

<t>The syntax of dictionary metadata is as follows:</t>

<t>dictionary-metadata = 1*dictionary-header “\n” <vspace />
dictionary-header = “domain” “:” value “\n” <vspace />
                        | “path” “:” value “\n” <vspace />
                        | “path-equals” “:” value “\n” <vspace />
                        | “format-version” “:” value “\n” <vspace />
                        | “max-age” “:” value “\n” <vspace />
                        | “port” “:” <”> portlist <”> “\n” <vspace />
portlist                = 1#portnum <vspace />
portnum                 = 1*DIGIT</t>

<t>A complete dictionary definition then has this format:
n
dictionary-definition   = dictionary-metadata payload</t>

<t>Informally, the metadata for a dictionary is a series of headers,
similar in form to HTTP headers, terminated by an empty line. The
dictionary payload begins immediately after this blank line.</t>

<t>The valid dictionary header identifiers are described below:</t>

<t><list style="symbols">
  <t>Domain: domain.</t>
</list></t>

<t>Required. Indicates the domain to which the dictionary applies. The
domain specification must explicitly start with a dot. For example, a
dictionary with the domain specification “.google.com” may be used to
compress a response served from the host name www.google.com, but not
used to compress a response served from the host name
www.gmail.com. Only printable ASCII characters are permitted in the
domain value. International Domain Names must be specified using IDNA.</t>

<t><list style="symbols">
  <t>Path: path.</t>
</list></t>

<t>Optional. Indicates the set of URL paths for which this
dictionary is valid. If unspecified, the dictionary applies to all
paths within the given domain.</t>

<t><list style="symbols">
  <t>Path-equals: path.</t>
</list></t>

<t>Optional. Indicates the exact URL path for which this dictionary is
valid. If both “path” and “path-equals” are specified, the dictionary
applies only to those URLs which satisfy both criteria.</t>

<t><list style="symbols">
  <t>Format-version: version.</t>
</list></t>

<t>Optional. Indicates the version of the dictionary payload. If
unspecified, the format version defaults to “1.0”. Currently, the only
acceptable value is “1.0”.</t>

<t><list style="symbols">
  <t>Max-age: delta-seconds.</t>
</list></t>

<t>Optional. Indicates the amount of time that a dictionary can be
advertised to the server by the user agent, relative to the time it
was downloaded. If unspecified, the default is 30 days from the time
the dictionary was downloaded by the user agent.  * Port: port
list. Optional. Indicates the comma-separated list of ports to which
this dictionary applies. If unspecified, the dictionary applies to all
ports.</t>

<t>Like HTTP headers, dictionary header identifiers are
case-insensitive. Unknown headers will be ignored by the user agent,
allowing other headers to be added in the future.</t>

</section>
<section anchor="dictionary-scope" title="Dictionary Scope">

<t>The specific rules of when a dictionary can be applied to a URL,
i.e. that define its scope, are modeled after the rules for cookie
scoping. The term “domain-match” is defined in RFC 2965. We define
path-matching as follows For two strings that represent paths, P1 and
P2, P1 path-matches P2 if either:</t>

<t><list style="numbers">
  <t>P2 is equal to P1</t>
  <t>P2 is a prefix of P1 and either the final character in P2 
     is “/” or the character following P2 in P1 is “/”.</t>
</list></t>

<t>For example, “/tec/waldo” path-matches “/tec”, “/tec/”, and
“/tec/waldo”, but does not path-match “/tec/wal”.</t>

<t>Given these definitions of domain-match and path-match, a request URL
falls within a dictionary’s scope exactly when all of the following
are true:</t>

<t><list style="numbers">
  <t>The request URL’s host name domain-matches the Domain attribute of
the dictionary.</t>
  <t>If the dictionary has a Port attribute, the
request port is one of the ports listed in the Port attribute.</t>
  <t>The request URL path-matches the path attribute of the dictionary.</t>
  <t>The
request URL’s scheme matches the scheme of the dictionary.</t>
</list></t>

<t>If a URL falls within a dictionary’s scope, the dictionary is said to
“apply” to the URL.</t>

</section>
<section anchor="dictionary-identifier" title="Dictionary Identifier">

<t>In communications between user agent and server, a dictionary is
identified by the first 96 bits of the SHA-256 digest <xref target="RFC6234"/> of a
dictionary’s metadata and payload (see dictionary-definition above)
exactly as it is received by the user agent from the server. Both user
agent and server compute this identifier independently, based on the
metadata and the payload of the dictionary. This digest should be
unique within a dictionary’s scope (domain and path) in order to
prevent dictionary identifier collisions.</t>

<t>The digest serves not only as an identifier but also as a safeguard
against attempts to maliciously intercept or otherwise modify
dictionary contents, since a compromised dictionary will hash to a
different identifier and the server will not recognize it. The user
agent identifier for a dictionary is defined as the URL-safe base64
encoding (as described in RFC 3548, section 4 <xref target="RFC3548"/> of the first
48 bits (bits 0..47) of the dictionary’s SHA-256 digest. The server
identifier for a dictionary is the URL-safe base64 encoding of the
second 48 bits (bits 48..95). When identifying a dictionary to the
server, the user agent uses the user agent identifier, and similarly,
when identifying a dictionary to the user agent, the server uses the
server identifier. Note that both user agent and server have the
entire dictionary and can thus compute both identifiers for the
dictionary.</t>

<t>As a consequence of this scheme, dictionaries do not need to be
explicitly named by site maintainers, as the protocol avoids
identifying them in any way other than the above digest-generated
identifiers.</t>

</section>
<section anchor="differences-between-dictionaries-and-cookies" title="Differences between Dictionaries and Cookies">

<t>Dictionaries are similar to cookies in that they allow sharing of
state over HTTP. Thus, we have modeled dictionaries after cookies, as
described in RFC 2965. However, because dictionaries are typically
larger than cookies, embedding a dictionary in the response would
increase latency of the response. Thus a dictionary is always sent as
a separate HTTP response (unlike a cookie which is included in a
Set-Cookie header of any HTTP response). The Get-Dictionary HTTP
response header is used to tell the user agent that it should fetch a
dictionary separately for use in future requests.</t>

<t>Likewise, rather than including the dictionary contents in the HTTP
request headers (like a cookie in the Cookie header), dictionary
identifiers (described above) are used to advertise available
dictionaries in HTTP requests from the user agent to the server.</t>

</section>
</section>
<section anchor="user-agent-server-interaction-description" title="User Agent / Server Interaction Description">

<section anchor="user-agent-role-in-http-request-generation" title="User Agent Role in HTTP Request Generation">

<t>The user agent:</t>

<t><list style="numbers">
  <t>Advertises support for the proposed protocol by adding the “sdch”
token to the Accept-Encoding header of HTTP requests.</t>
  <t>Advertises any dictionaries it possesses that apply to the URL
being requested (per the scoping rules above) in the Avail-Dictionary
request header.</t>
</list></t>

<t>The Avail-Dictionary header syntax is as follows:
avail-dictionary-header = “Avail-Dictionary” “:” 1#dictionary-client
where dictionary-client-id is the user agent identifier part for the
dictionary based on the SHA-256 digest as described above. The value
of this header is informally a comma separated list of user agent
dictionary identifiers.</t>

<t>The user agent must advertise every dictionary it has cached that
applies to the requested URL. It is only the presence of the
dictionary identifier in this header that indicates to the server that
the user agent possesses and therefore does not need to download the
dictionary. Since the user agent must advertise every dictionary it
has, it is the site maintainer’s responsibility to avoid making too
many dictionaries available at a given time. Advertising many
dictionaries in this header can counteract the benefits of
compression.</t>

<t>Note that for each individual request the user agent has discretion
over whether or not to add “sdch” Accept-Encoding token and the
Avail-Dictionary header. Since some responses, such as image data, are
unlikely to benefit from dictionary compression, the user agent can
reduce the size of its requests by not sending this token and
header. The user agent may decide whether or not to add these headers
based on file extensions in URLs or the context of the request. For
instance, the user agent may choose to not advertise SDCH for URLs
referenced in IMG elements.</t>

</section>
<section anchor="server-role-in-http-response-generation" title="Server Role in HTTP Response Generation">

<t>When a server that supports the extension receives a request that
indicates that the user agent supports the protocol (e.g. the “sdch”
token is present in the Accept-Encoding request header), two
independent decisions must be made. The server must decide: 1. if it
wants to send an encoded response.  2. if it wants to inform the user
agent about additional dictionaries it can download and use in the
future.</t>

<t>The server may return an encoded response only if all of the following are true:
1. The Accept-Encoding request header contains the “sdch” token.
2. The server can send a response compressed with a dictionary whose dictionary-client-id is in the  Avail-Dictionary request header.</t>

<t>A server may return a response that is not encoded even if it
recognizes a dictionary advertised by the user agent.  If the server
decides to not use SDCH encoding when a Avail-Dictionary header is
present, it must include a specific HTTP header X-SDCH-Encoding with
value “0” in the response.  The syntax of the X-SDCH-Encoding header
is:</t>

<t>sdch-not-used-header = “X-SDCH-Encoding” “:” “0”</t>

<t>The server indicates that an HTTP response is encoded by inserting the
token “sdch” into the Content-Encoding header of the HTTP response.</t>

<t>A compatible server may instruct a compatible user agent to download
one or more new dictionaries by including the Get-Dictionary header in
the HTTP response. The server may advertise a Get-Dictionary header
even if the response is not encoded. The syntax of the Get-Dictionary
header is: get-dictionary-header = “Get-Dictionary” “:” 1#partial-url
where partial-url is either a complete URL, or just the absolute URL
path (in which case the scheme, host, and port of the originating
server would be used when requesting the dictionary). If a complete
URL is provided, it must have the same scheme, host, and port as the
originating server. The Content-Type header of dictionary responses
must be application/x-sdch-dictionary. The value in the get dictionary
header is a comma-separated list of partial-url elements.</t>

<t>The server must not advertise a dictionary with a dictionary-client-id
that the user agent has listed in the Avail-Dictionary header.</t>

<t>The server may use SDCH compression with a dictionary that the user
agent has advertised and also include a Get-Dictionary header for a
different dictionary that the user agent has not advertised.</t>

<t>The server must prevent SDCH-encoded responses from being cached by
intermediate proxies. See the section below on proxy caching for
additional details.</t>

<t>The server should limit the number of active dictionaries at any one
time, by using well-scoped dictionaries. A server that has many active
dictionaries with overlapping scope will cause user agents to generate
a very long Avail-Dictionary header, the overhead of which can
counteract the benefits of SDCH compression.</t>

<t>The server may decide to precompute and cache SDCH-encoded responses
if a given SDCH-encoded response will be served multiple times
(e.g. for static content).</t>

<t>The server may apply multiple Content-Encodings to the response,
(e.g. sdch and gzip) in which case subsequent encoding tokens are
appended to the Content-Encoding header, per the HTTP/1.1 RFC section
14.11.</t>

</section>
<section anchor="user-agent-role-in-http-response-handling" title="User Agent Role in HTTP Response Handling">

<t>An SDCH-compatible user agent must inspect the Content-Encoding HTTP
response header to determine if the response is SDCH-encoded. If the
Content-Encoding includes the “sdch” token, the user agent must
perform SDCH decompression on the response.</t>

<t>If the HTTP response includes a Get-Dictionary header, the user agent
must verify that the partial-url specified refers to the same server
that generated the response. If so, the user agent may download the
dictionary at the given URL.</t>

<t>There are two different URLs to consider when downloading and storing
a dictionary. The referer URL is the URL of the request that resulted
in the server responding with a Get-Dictionary header.</t>

<t>The dictionary URL is defined as follows:</t>

<t><list style="numbers">
  <t>If the partial-url is a complete URL, the dictionary URL is the
partial-url.</t>
  <t>If the partial-url is just a path URL, the dictionary
URL is generated from the scheme and host name of the referrer URL and the path
in the partial-url.</t>
</list></t>

<t>The user agent may retrieve a dictionary if the origin of the
dictionary matches the origin of the referrer. HTTP redirects may only
be followed if the origin matches as well.</t>

<t>Upon retrieving the dictionary, the user agent must validate the
dictionary. Here again, the validation rules are modeled after the
rules for when a user agent can accept an HTTP cookie. A dictionary is
invalid and must not be stored if any of the following are true:</t>

<t><list style="numbers">
  <t>The dictionary has no Domain attribute.</t>
  <t>The effective host name that derives from the referrer URL host name does not domain-match the Domain attribute.</t>
  <t>The Domain attribute is a top level domain.</t>
  <t>The referrer URL host is a host domain name (not IP address) and has the form HD, where D is the value of the Domain attribute, and H is a string that contains one or more dots.</t>
  <t>If the dictionary has a Port attribute and the referrer URL’s port was not in the list.</t>
</list></t>

<t>If the dictionary is valid and user agent decides to store the
dictionary, the scheme of the dictionary URL should also be stored
along with dictionary.</t>

</section>
<section anchor="sdch-encoded-response-body" title="SDCH-Encoded Response Body">

<t>An SDCH-encoded response starts with the dictionary-server-id used to
compress the response. The syntax of the SDCH-encoded response is:
dictionary-compression-response = dictionary-server-id “\0” vcdiff-payload</t>

</section>
</section>
<section anchor="examples" title="Examples">

<t>For the purpose of these examples, assume the following
dictionaries exist on the server and can be downloaded from the
following URLs:</t>

<t>“Search results” dictionary</t>

<t><list style="symbols">
  <t>domain: .google.com</t>
  <t>path: /search</t>
  <t>user agent ID: TWFuIGlz</t>
  <t>server ID: JOWk0d2N</t>
  <t>download location: /dictionaries/search_dict</t>
</list></t>

<t>“Help pages” dictionary</t>

<t><list style="symbols">
  <t>domain: .google.com</t>
  <t>path: /</t>
  <t>user agent ID: GVhc3V48</t>
  <t>server ID: O9d2_m3-</t>
  <t>download location: /dictionaries/help_dict</t>
</list></t>

<t>Note that the dictionary identifier consists of two parts: user agent
ID and the server ID. Most of the detail of the request and response
headers has been omitted.</t>

<section anchor="example-1-initial-interaction-user-agent-has-no-dictionaries" title="Example 1:  Initial Interaction, User Agent has No Dictionaries">
<t><list style="numbers">
  <t>user agent’s request</t>
</list></t>

<t>GET /search?q=sprouts HTTP/1.1 <vspace />
Host: www.google.com <vspace />
Accept-Encoding: sdch, gzip</t>

<t><list style="numbers">
  <t>server’s response</t>
</list></t>

<t>HTTP/1.1 200 OK <vspace />
Content-type: text/html <vspace />
Content-Encoding: gzip <vspace />
Get-Dictionary: /dictionaries/search_dict, /dictionaries/help_dict <vspace />
Cache-Control: private</t>

<t>Note that the response returned by the server does NOT use SDCH
encoding, since the user agent does not have a dictionary. The server
simply provides the locations of the dictionaries for future use. The
user agent may choose to retrieve one or both dictionaries separately.</t>

</section>
<section anchor="example-2-user-agent-requests-the-dictionary" title="Example 2:  User Agent Requests the Dictionary">
<t><list style="numbers">
  <t>user agent’s request</t>
</list></t>

<t>GET /dictionaries/search_dict HTTP/1.1 <vspace />
Host: www.google.com <vspace />
Accept-Encoding: sdch, gzip</t>

<t><list style="numbers">
  <t>server’s response</t>
</list></t>

<t>HTTP/1.1 200 OK <vspace />
Content-type: application/x-sdch-dictionary <vspace />
Content-Encoding: gzip</t>

<t>Domain: .google.com <vspace />
Path: /search<vspace />
Format-version: 1.0</t>

<t>…dictionary contents…</t>

<t>Upon receiving this response, the user agent computes the digest of
the dictionary and determines the user agent ID is TWFuIGlz and the
server ID is JOWk0d2N.</t>

</section>
<section anchor="example-3-user-requests-page-and-user-agent-has-already-downloaded" title="Example 3: User Requests Page AND User Agent Has Already Downloaded">
<t>the Dictionary</t>

<t><list style="numbers">
  <t>user agent’s request</t>
</list></t>

<t>GET /search&q=brussel+sprouts HTTP/1.1 <vspace />
Host: www.google.com<vspace />
Accept-Encoding: sdch, gzip<vspace />
Avail-Dictionary: TWFuIGlz</t>

<t><list style="numbers">
  <t>server’s response</t>
</list></t>

<t>HTTP/1.1 200 OK <vspace />
Content-type: text/html <vspace />
Content-Encoding: sdch, gzip <vspace />
Get-Dictionary: /dictionaries/help_dict <vspace />
Cache-Control: private</t>

<t>JOWk0d2N<NUL>…VCDIFFed response… <vspace />
(note that the response shown to the left the result of gzip decompression)</t>

<t>The server has properly identified the dictionary using its server ID
and the user agent can confirm that the second 48 bits of the SHA-256
digest of the dictionary match its computation. It can then decompress
the VCDIFF response using this dictionary. Even though the “search
results” dictionary was used to decompress the response, the server
has chosen to indicate another dictionary could be requested by the
user agent from http://www.google.com/dictionaries/help_dict. This
dictionary must be different than the “search results” dictionary as
the server must never request the user agent download a dictionary it
knows the user agent already has. Let’s assume the user agent decides
to download this dictionary.</t>

</section>
<section anchor="example-4-user-requests-with-multiple-dictionaries" title="Example 4: User Requests with Multiple Dictionaries">
<t><list style="numbers">
  <t>user agent’s request</t>
</list></t>

<t>GET /search&q=brussels HTTP/1.1 <vspace />
Host: www.google.com <vspace />
Accept-Encoding: sdch, gzip <vspace />
Avail-Dictionary: GVhc3V48,TWFuIGlz</t>

<t><list style="numbers">
  <t>server’s response</t>
</list></t>

<t>HTTP/1.1 200 OK <vspace />
Content-type: text/html <vspace />
Content-Encoding: sdch, gzip <vspace />
Cache-Control: private</t>

<t>JOWk0d2N<NUL>…VCDIFFed response…  (note that the response shown to
the left the result of gzip decompression)</t>

<t>The user agent advertises that it has already downloaded two
dictionaries that apply.  The server may compress the response with
either dictionary.  As the server has no other dictionaries that apply
to the request, it does not advertise any dictionaries in its
response.</t>

</section>
</section>
<section anchor="implementation-considerations" title="Implementation Considerations">

<section anchor="implementation-limits" title="Implementation Limits">

<t>There are practical limitations to the number and size of the
dictionaries a user agent can store. It is suggested that general use,
non-mobile user agents should have the following minimum capabilities:</t>

<t><list style="symbols">
  <t>At least 300 dictionaries stored</t>
  <t>At least 100KB of payload per dictionary</t>
  <t>At least 10MB of total dictionary contents</t>
  <t>At least 20 dictionaries stored per domain</t>
</list></t>

</section>
<section anchor="dictionary-downloading" title="Dictionary Downloading">

<t>The user agent always has the choice of whether or not to download a
dictionary. It is recommended that the user agent be implemented with
sufficient state to avoid downloading too many dictionaries from the
same server. A malfunctioning server may also request the user agent
continually download the same dictionary. One simple method to avoid
both of these possibilities is for the user agent to rate-limit
downloading dictionaries from the same domain.</t>

<t>When the user agent receives a response with a Get-Dictionary header
with dictionary download URLs that it may fetch, it should perform the
dictionary downloads in the background. This is possible as the
dictionary to be downloaded is guaranteed to not be needed to
decompress the response with the Get-Dictionary header. The user agent
should be careful to abort background dictionary downloads that do not
complete in a reasonable amount of time.</t>

</section>
<section anchor="data-integrity" title="Data Integrity">

<t>If the dictionaries are tied to individual users or specific user
actions, HTTP may leak this information to passive attacker by
allowing the Get-Dictionary info to be seen. When using HTTPS, the
same risk is prevented in the design document since Get-Dictionary
URLs are required to be same-origin as the response.</t>

<t>However, Downloading dictionaries over HTTPS or advertising
dictionaries over HTTPS might introduce new security risks.</t>

<t>TODO: add some examples. For example, SDCH-over-HTTPS subject to
compression oracle attacks similar to CRIME/BREACH with the difference
that the compression context is not supplied by the attacker. If an
attacker had the contents of a dictionary, there is a theoretical
possibility where a server sends a static response XOR’ed with
user-provided data. The Attacker can provide data which reduced the
size of the response when XOR’ed with the static response, the
attacker may then be able to determine the contents of the static
response.</t>

<t>The protocol needs to ensure that the content as decompressed by the
user agent with a given dictionary is identical to the server’s
originally intended content. The three areas that can cause a data
integrity problem are discussed below.</t>

<section anchor="data-tampered-by-proxy" title="Data tampered by Proxy">

<t>We have found incorrectly implemented proxies which tamper with an
SDCH response and make the response unable to be decompressed to the
server’s originally intended content.  The tampering may not be
detected in the SDCH encoding itself if the proxy makes SDCH content
look like non-SDCH content, for instance, by stripping the ‘sdch’
token from the content-encoding header of the response or by adding
additional encodings (like gzip) on top of the SDCH and gzipped
response without making the Content-Encoding header match.  In order
to detect when this occurs, the HTTP header X-SDCH-Encoding must be
added to the response by the server to inform the client that the
response was originally not SDCH encoded by the server.  Should the
user agent advertise SDCH capability in the request but receive a
non-SDCH encoded response without the X-SDCH-Encoding header, it
suggests that the response was tampered by a proxy.  The user agent
may then take action to avoid using SDCH in the future.</t>

</section>
<section anchor="dictionary-mismatch" title="Dictionary mismatch">

<t>When a dictionary information is exchanged between user agent and
server, it is necessary to ensure that the dictionary identifiers are
completely unambiguous, or the decompressed result may differ from the
original content. To address this issue, SDCH uses the first 96 bits
of the SHA-256 digest of a dictionary’s metadata and payload to create
the dictionary identifiers used by the user agent and server to avoid
ambiguity. (Please refer to the section “Dictionaries description”
above for details.)</t>

</section>
<section anchor="data-corruption-malicious-attacks" title="Data corruption / malicious attacks">

<t>While this issue is not specific to SDCH, it can be exacerbated due to
the nature of the stateful compression. For example, if the dictionary
is corrupted or maliciously modified in a persistent on-disk cache,
all subsequent responses decoded by using this dictionary will be
corrupt. For this reason, the user agent and server should revalidate
the dictionaries’ integrity when they are loaded from non-volatile
storage.</t>

<t>Other issues like data corruption during transmission in the encoded
payload could have much bigger adverse effect than that in the plain
text. TCP provides a checksum, but it cannot detect some errors like
swapped bytes. To address this issue, SDCH includes an Adler32
checksum <xref target="RFC1950"/> in the encoded data shards. (Please refer to
appendix “VCDIFF Encoding Format and SDCH” for details.)</t>

</section>
</section>
</section>
<section anchor="response-caching" title="Response Caching">

<section anchor="user-agent-cache" title="User Agent Cache">

<t>The user agent should honor HTTP caching directives (Cache-Control,
Expires,…) for caching responses, whether or not the responses are
SDCH-encoded. When caching the SDCH-encoded responses, the
SDCH-encoded responses should be decoded before being written to the
cache. If this is not possible, the user agent may cache SDCH-encoded
responses, unless the HTTP response headers indicate that the response
is not cacheable. In this case, an SDCH-encoded cache entry should be
invalidated when (1) the dictionary used to encode that response is
deleted from the dictionary store, (2) the SDCH decompression user
agent is uninstalled (if it is implemented as a browser add-on), or
(3) the SDCH capable user agent is disabled.</t>

<t>Intermediate Caches</t>

<t>The server should use HTTP cache headers that prevent non-SDCH-aware
intermediate cache servers from storing the encoded contents. The
cache directive “Cache-Control: private” can be used for this purpose.</t>

<t>If the compressed response can be cached by proxy caches, the server
must include the HTTP header “Vary: Accept-Encoding, Avail-Dictionary”
to alert proxies about sending the cached content only to the user
agents who can decode it. Note that some proxies may not respect the
Vary header, in which case non-SDCH-capable user agents would end up
downloading SDCH-encoded responses. Thus, we recommend that
SDCH-encoded responses not be cacheable by intermediate proxies unless
there is a very compelling reason. Further, “Vary: Accept-Encoding,
Avail-Dictionary” will not match requests unless these headers match
exactly.</t>

<t>A proxy cache may provide one of three levels of support for caching
SDCH-encoded objects.</t>

<t><list style="numbers">
  <t>No support – Never cache any response if the header Vary is present.</t>
  <t>Basic support – The proxy cache only serves cached SDCH-encoded
content if all cache serving conditions are satisfied and the values
of the HTTP headers specified in the Vary header of the cached content
exactly match the corresponding headers in the HTTP request.</t>
  <t>Full support – The proxy should understand the SDCH protocol,
should know what dictionary is used to encode/decode the response, and
should be able to download advertised dictionaries. The cache needs to
have both SDCH user agent and server logic in it. The server should
store the SDCH decoded responses in its cache.</t>
</list></t>

<t>Dictionary Caching
User Agent Cache</t>

<t>As dictionary payloads may be large compared to the size of individual
HTTP responses, in order to maximize latency improvements and minimize
the bandwidth overhead of downloading dictionaries, it is recommended
that the user agent persistently store dictionaries in a dictionary
cache (e.g. on disk). It is suggested that the user agent implement a
maximum limit on number of dictionaries stored per domain in order to
avoid allowing one domain to force dictionaries for other domains out
of the user agent’s dictionary cache. To implement a fixed maximum
size cache it is recommended that the cache manager first evict the
dictionaries that were least recently used for decoding.</t>

<t>Ideally dictionaries will be stored in the same cache as HTTP
responses and may be inspected and cleared by the user using existing
user interfaces. However, new support may be created to fulfill the
need for the user agent to be able to quickly determine which
dictionaries should be advertised for a given request.</t>

<t>The user agent should be careful to validate that a dictionary matches
its original identifier before being used for decompression to prevent
malicious attacks on the dictionary cache. The user agent may
implicitly handle this by always recomputing the hash before
advertising the dictionary. However, to improve efficiency, the user
agent may cache the original digest of the dictionary, advertise the
dictionary with that digest, and then only for the dictionary selected
by the server to encode the response, verify that the cached
dictionary digest still matches the digest computed from the cached
dictionary.</t>

<t>The user agent must not evict dictionaries from its dictionary store
that have been advertised in the Avail-Dictionary header of a HTTP
request for which a response has not yet been returned.</t>

<t>If a user agent downloads a dictionary which has the same identifier
as another previously downloaded dictionary which are applicable to
the same hosts, the user agent must be careful to either ignore the
new dictionary or evict the old dictionary. If the two dictionaries
with the same identifier have exactly the same contents the choice is
not important, however this indicates a server error as a server must
never instruct the user agent to download a dictionary that was
advertised by the user agent. The user agent may want to avoid
downloading dictionaries from this server in the future as they may
not be new and downloading unnecessary dictionaries can increase
latency.</t>

<t>Intermediate Caches</t>

<t>The dictionary should be treated as a regular HTTP response by
intermediate proxies. Thus, the normal HTTP caching consideration for
intermediate proxies should apply to the dictionary as well.</t>

</section>
</section>
<section anchor="future-directions" title="Future Directions">
<t>=====================</t>

<t>As currently proposed, SDCH is not applicable to another case where
differential compression would be beneficial: large files that change
infrequently and in small ways, such as JavaScript and CSS files
referenced by other HTML documents.</t>

<t>TODO: Re-evaluate dictionary scoping rules, current approach that
patterned after the same named attributes from the HTTP State
Management Specification <xref target="RFC2965"/> may not be the best choice.</t>

</section>
<section anchor="current-status-and-updates" title="Current Status and Updates">

<t>For current information about the status of this proposal:
  https://groups.google.com/group/SDCH</t>

</section>
<section anchor="iana-considerations" title="IANA Considerations">

<t>This document makes no requests of IANA.</t>

</section>
<section anchor="security-considerations" title="Security Considerations">

<t>Some security considerations are discussed in the data integrity
section above, but the author anticipates further work to describe
these.</t>

</section>
<section anchor="acknowledgements" title="Acknowledgements">

<t>The authors would like to acknowledge the support of Google, Inc. for
the development of this work. Technical editor: Harriett
Hardman. Feedback and comments: Greg Badros, Chandra Chereddi, Darren
Fisher, Ted Hardie, Ashu Jain, Ian Hickson, Othman Laraki, Jim
Roskind, Ryan Sleevi, Lincoln Smith, Randy Smith, and Linus Upson.</t>

</section>
<section anchor="appendix-vcdiff-encoding-format-and-sdch" title="Appendix:  VCDIFF Encoding Format and SDCH">

<t>Although the SDCH protocol is proposed so that it could be adapted for
use with any differential-encoding format, it currently uses the
VCDIFF encoding format. This format was chosen because its definition
is publicly available as the RFC 3284 draft standard. The VCDIFF
format is independent of the method used for finding the longest
possible matches between the dictionary (source) data and the payload
(target) data.</t>

<t>An encoder and decoder for the VCDIFF format, intended for use with
SDCH, has been released as open-source under the Apache license. This
package is called “open-vcdiff”. It uses the Bentley/McIlroy technique
for finding matches between the dictionary and target data. It
conforms to the VCDIFF draft standard, with the following exceptions:</t>

<t>Interleaved format</t>

<t>The VCDIFF draft standard format divides each
encoded delta window into three sections (data, instructions, and
addresses), with the aim of improving compressibility of the encoded
file using a secondary compressor such as gzip. The drawback to this
approach is that none of the target data can be reconstructed unless
the entire delta window is available. The delta window is received in
packets over the network and it is desirable to be able to process its
contents as they arrive. In order to facilitate decoding a stream of
packets from the network, we have modified the VCDIFF format so that
it interleaves the data, instructions, and addresses instead of
placing them in three separate sections. Each instruction is followed
by its size and then by an address or literal data.</t>

<t>Adler32 checksum</t>

<t>The format can be modified to include an Adler32 checksum <xref target="RFC1950"/>
of the target window data. If the checksum format is used, then bit 2
(0x04, defined as VCD_CHECKSUM) of the Win_Indicator byte will be set,
and the checksum will appear just after the “Length of addresses for
COPYs” field and before the “Data section for ADDs and RUNs” section
in the encoding.</t>

<t>Version header byte (Header4)</t>

<t>If either of the two enhancements described above is used, then the
resulting format will not conform to the VCDIFF draft standard as
described in RFC 3284. In order to indicate this deviation from the
standard, the fourth byte in the encoding (Header4, reserved for the
VCDIFF version code) will be set to 0x53 (a capital “S” character in
ASCII.)  If neither enhancement is used, the fourth byte may be 0x00
(a null character), the default value described in the standard.</t>

<t>VCD_TARGET flag and target COPY instructions not allowed for SDCH</t>

<t>The SDCH protocol is intended to produce a delta between static
dictionary data and target data. Secondary compression with gzip will
be used to eliminate redundancy within the target data. For this
reason, when using VCDIFF for SDCH, the Win_Indicator flag should
always include the VCD_SOURCE flag, never the VCD_TARGET flag. COPY
instructions should only reference addresses within the source data,
never within the previously decoded target.</t>

<t>The Xdelta package (http://xdelta.org) produces a format based on
VCDIFF, though not 100% compatible with the RFC draft standard. That
package has been released under the GNU General Public License.</t>

</section>


  </middle>

  <back>

    <references title='Normative References'>





<reference  anchor='RFC2119' target='http://www.rfc-editor.org/info/rfc2119'>
<front>
<title>Key words for use in RFCs to Indicate Requirement Levels</title>
<author initials='S.' surname='Bradner' fullname='S. Bradner'><organization /></author>
<date year='1997' month='March' />
<abstract><t>In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized. This document defines these words as they should be interpreted in IETF documents.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t></abstract>
</front>
<seriesInfo name='BCP' value='14'/>
<seriesInfo name='RFC' value='2119'/>
<seriesInfo name='DOI' value='10.17487/RFC2119'/>
</reference>



<reference  anchor='RFC7230' target='http://www.rfc-editor.org/info/rfc7230'>
<front>
<title>Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing</title>
<author initials='R.' surname='Fielding' fullname='R. Fielding' role='editor'><organization /></author>
<author initials='J.' surname='Reschke' fullname='J. Reschke' role='editor'><organization /></author>
<date year='2014' month='June' />
<abstract><t>The Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems.  This document provides an overview of HTTP architecture and its associated terminology, defines the "http" and "https" Uniform Resource Identifier (URI) schemes, defines the HTTP/1.1 message syntax and parsing requirements, and describes related security concerns for implementations.</t></abstract>
</front>
<seriesInfo name='RFC' value='7230'/>
<seriesInfo name='DOI' value='10.17487/RFC7230'/>
</reference>




    </references>

    <references title='Informative References'>





<reference  anchor='RFC3284' target='http://www.rfc-editor.org/info/rfc3284'>
<front>
<title>The VCDIFF Generic Differencing and Compression Data Format</title>
<author initials='D.' surname='Korn' fullname='D. Korn'><organization /></author>
<author initials='J.' surname='MacDonald' fullname='J. MacDonald'><organization /></author>
<author initials='J.' surname='Mogul' fullname='J. Mogul'><organization /></author>
<author initials='K.' surname='Vo' fullname='K. Vo'><organization /></author>
<date year='2002' month='June' />
<abstract><t>This memo describes VCDIFF, a general, efficient and portable data format suitable for encoding compressed and/or differencing data so that they can be easily transported among computers.  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='3284'/>
<seriesInfo name='DOI' value='10.17487/RFC3284'/>
</reference>



<reference  anchor='RFC3229' target='http://www.rfc-editor.org/info/rfc3229'>
<front>
<title>Delta encoding in HTTP</title>
<author initials='J.' surname='Mogul' fullname='J. Mogul'><organization /></author>
<author initials='B.' surname='Krishnamurthy' fullname='B. Krishnamurthy'><organization /></author>
<author initials='F.' surname='Douglis' fullname='F. Douglis'><organization /></author>
<author initials='A.' surname='Feldmann' fullname='A. Feldmann'><organization /></author>
<author initials='Y.' surname='Goland' fullname='Y. Goland'><organization /></author>
<author initials='A.' surname='van Hoff' fullname='A. van Hoff'><organization /></author>
<author initials='D.' surname='Hellerstein' fullname='D. Hellerstein'><organization /></author>
<date year='2002' month='January' />
<abstract><t>This document describes how delta encoding can be supported as a compatible extension to HTTP/1.1.  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='3229'/>
<seriesInfo name='DOI' value='10.17487/RFC3229'/>
</reference>



<reference  anchor='RFC3929' target='http://www.rfc-editor.org/info/rfc3929'>
<front>
<title>Alternative Decision Making Processes for Consensus-Blocked Decisions in the IETF</title>
<author initials='T.' surname='Hardie' fullname='T. Hardie'><organization /></author>
<date year='2004' month='October' />
<abstract><t>This document proposes an experimental set of alternative decision-making processes for use in IETF working groups.  There are a small number of cases in IETF working groups in which the group has come to consensus that a particular decision must be made but cannot agree on the decision itself.  This document describes alternative mechanisms for reaching a decision in those cases.  This is not meant to provide an exhaustive list, but to provide a known set of tools that can be used when needed.  This memo defines an Experimental Protocol for the Internet community.</t></abstract>
</front>
<seriesInfo name='RFC' value='3929'/>
<seriesInfo name='DOI' value='10.17487/RFC3929'/>
</reference>



<reference  anchor='RFC3548' target='http://www.rfc-editor.org/info/rfc3548'>
<front>
<title>The Base16, Base32, and Base64 Data Encodings</title>
<author initials='S.' surname='Josefsson' fullname='S. Josefsson' role='editor'><organization /></author>
<date year='2003' month='July' />
<abstract><t>This document describes the commonly used base 64, base 32, and base 16 encoding schemes.  It also discusses the use of line-feeds in encoded data, use of padding in encoded data, use of non-alphabet characters in encoded data, and use of different encoding alphabets.  This memo provides information for the Internet community.</t></abstract>
</front>
<seriesInfo name='RFC' value='3548'/>
<seriesInfo name='DOI' value='10.17487/RFC3548'/>
</reference>



<reference  anchor='RFC2965' target='http://www.rfc-editor.org/info/rfc2965'>
<front>
<title>HTTP State Management Mechanism</title>
<author initials='D.' surname='Kristol' fullname='D. Kristol'><organization /></author>
<author initials='L.' surname='Montulli' fullname='L. Montulli'><organization /></author>
<date year='2000' month='October' />
<abstract><t>This document specifies a way to create a stateful session with Hypertext Transfer Protocol (HTTP) requests and responses.  [STANDARDS-TRACK]</t></abstract>
</front>
<seriesInfo name='RFC' value='2965'/>
<seriesInfo name='DOI' value='10.17487/RFC2965'/>
</reference>



<reference  anchor='RFC1950' target='http://www.rfc-editor.org/info/rfc1950'>
<front>
<title>ZLIB Compressed Data Format Specification version 3.3</title>
<author initials='P.' surname='Deutsch' fullname='P. Deutsch'><organization /></author>
<author initials='J-L.' surname='Gailly' fullname='J-L. Gailly'><organization /></author>
<date year='1996' month='May' />
<abstract><t>This specification defines a lossless compressed data format.  This memo provides information for the Internet community.  This memo does not specify an Internet standard of any kind.</t></abstract>
</front>
<seriesInfo name='RFC' value='1950'/>
<seriesInfo name='DOI' value='10.17487/RFC1950'/>
</reference>



<reference  anchor='RFC6234' target='http://www.rfc-editor.org/info/rfc6234'>
<front>
<title>US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)</title>
<author initials='D.' surname='Eastlake 3rd' fullname='D. Eastlake 3rd'><organization /></author>
<author initials='T.' surname='Hansen' fullname='T. Hansen'><organization /></author>
<date year='2011' month='May' />
<abstract><t>Federal Information Processing Standard, FIPS</t></abstract>
</front>
<seriesInfo name='RFC' value='6234'/>
<seriesInfo name='DOI' value='10.17487/RFC6234'/>
</reference>




    </references>



  </back>
</rfc>


PAFTECH AB 2003-20262026-04-24 03:11:14