One document matched: draft-rpeon-httpbis-header-compression-00.xml


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
  <!ENTITY RFC2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
  <!ENTITY RFC2616 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2616.xml'>
  ]>

  <?xml-stylesheet type="text/xsl" href="rfc2629.xslt"?>

  <?rfc toc="yes" ?>
  <?rfc symrefs="yes" ?>
  <?rfc sortrefs="yes" ?>
  <?rfc compact="yes" ?>
  <?rfc subcompact="no" ?>
  <?rfc strict="no" ?>

  <rfc category="info" ipr="trust200902" docName="draft-rpeon-httpbis-header-compression-00">
    <front>
      <title abbrev="HTTP/2 Header Compression">Header Delta-Compression for HTTP/2.0</title>
      <author initials="R." surname="Peon" fullname="Roberto Peon">
        <organization>Google, Inc</organization>
        <address>
          <email>fenix@google.com</email>
        </address>
      </author>
      <date day="12" month="Oct" year="2012" />
      <area>Applications</area>
      <keyword>HTTP</keyword>
      <abstract>
        <t>This document describes a mechanism for compressing streams of groups of key-value pairs, often known as Headers in an HTTP session. See <xref target="RFC2616">RFC 2616</xref> or successors for more information about headers.</t>
      </abstract>
    </front>

    <middle>
      <section anchor="intro" title="Overview">
        <t>There have been several problems pointed out with the use of the gzip compressor in <xref target="SPDY">SPDY</xref>. The biggest of these problems is that it is possible for a smart attacker to inject content into the compressor, and then to test hypotheses about the prior contents of the compressor by examining the output size after each such content injection. The other issue is that gzip often consumes more CPU than many would like, especially in situations where one is doing forward or reverse proxying. The compressor proposed here intends to solve the first issue and significantly mitigate the second, while providing compression that is not too much worse than gzip.</t>

      </section>
      <section title="Definitions">
        <t>
          <list style="hanging" hangIndent="6">
            <t>user-agent: The program or device which a human interacts with directly and which typically initiates the transport layer connection or session</t>
            <t>client: Synonym for user-agent</t>
            <t>server: The computer or device which typically accepts a connection, stores, and serves data</t>
            <t>proxy: An entity acting as a server for the client, and a client for the server</t>
            <t>header: An complete set of key-value pairs, either request-headers, or response headers as defined in <xref target="RFC2616">RFC2616</xref> section 5.3 or 6.2, respectively</t>
          </list>
        </t>
      </section>

      <section title="Example-data-model">
			  <t> The author's current experimental implementation uses something very close to the following data-model for doing compression.</t>
        <figure>
          <artwork><![CDATA[
            struct ValEntry {
              KeyMapIterator krt;
              ValMapIterator vrt;
              LRUIterator lrt;
              Index val_idx;
            };

            typedef map<string, ValEntry*> ValMap;

            struct KeyEntry {
              int refcnt;
              Index key_idx;
              ValMap valmap;
            }
            typedef map<string, KeyEntry> KeyMap;
            typedef set<Index> HeaderGroup;
            typedef map<Index, HeaderGroup> HeaderGroups;

            struct CompressorState {
              HeaderGroups header_groups;
              KeyMap key_map;
              list<ValEntry*> lru;
            };

            typedef map<KeyIdx, string>; KeyIdxMap;
            typedef map<ValIdx, pair<KeyIdxMap::iterator,string>; ValIdxMap;

            struct DecompressorState {
              KeyIdxMap key_idx_map;
              ValIdxMap val_idx_map;
              list<KeyIdxMap::iterator> lru;
            };
            ]]>
          </artwork>
        </figure>
        </section>

      <section title="Compression-overview">
        <t> Note that this only describes conceptually how the algorithm works and is very much pseudo-code</t>
        <t> User-agents and servers SHOULD NOT NOT use EREF operations.</t>
        <t> User-agents and servers SHOULD use huffman-coding when so doing reduces the size of the bytes on the wire</t>
        <t> Proxies MAY use EREF operations when throughput is more important than client experienced latency.</t>
        <t> Proxies MAY choose to encode all strings as literal strings and skip huffman coding.</t>
        <figure>
          <artwork><![CDATA[
Compress(headers):
  first_used_val_idx = CurrentValIdx()
  old_keymap = keymap

  For header in headers:
    if key in keymap:
      keymap[key].refcnt += 1

  for header in headers:
    key_map_iterator = lookup(header.key)
    if (!key_map_iterator):
      key_iterator = insert_new_key(header.key)
      insert_new_value_from_key(key_iterator, header.val)
      instructions.append(KVSto(key,val))
    else if (val_iterator = lookup_val_in_keymap_entry(key_map_iterator,
                                                        header.val)):
      // we've found that the key,value pair already exists in the compressor
      // we now must check to see if it is already visible, in which case
      // there is nothing to do. If it isn't visible, however, we'll toggle
      // the visibility on.
      if (!header_group.includes(val_iterator_>val_idx)):
        instructions.append(Toggle(val_iterator_>val_idx));
    else:
      // we know we found a key, but not the entire key_value.
      // thus, we emit a clone() instructions.
      instructions.append(Clone(key_iterator_>key_id, header.val)

    if key in old_keymap:
      keymap[key].refcnt += 1

    for header_group_entry in header_group:
      if header_group_entry not in header:
        instructions.append(header_group_entry_>idx())
    for instruction in instructions:
      ExecuteInstruction(instructions)
    SerializeInstructions(output_buffer, instructions, first_used_val_idx)


ExecuteInstruction(instruction):
  while (instruction.size_delta() + state_size > max_state_size):
    RemoveOldestLRUItem()
  if instruction.opcode == "clone":
    InsertKV(instruction.lookup_key(), instruction.val(), GetNextLRUIdx())
  elif instruction.opcode == "kvsto":
    InsertKV(instruction.key(), instruction.val(), GetNextLRUIdx())
  elif instruction.opcode == "togglerange":
    for (i=instruction.start; i <= instruction.last; ++i)
      bool prev_state = ToggleIdx(i)
      if prev_state == NOT_VISIBLE:
        MoveToFrontOfLRU(i)
  elif instruction.opcode == "toggle":
    bool prev_state = ToggleIdx(instruction.index)
    if prev_state == NOT_VISIBLE:
      MoveToFrontOfLRU(instruction.index)


SerializeInstructions(output_buffer, instructions, first_val_idx):
  // Note that huffman-encoding is optional-- a bit (somwhere) would
  // indicate that it was or was not used for any particular string.
  output_buffer.WriteValIdx(first_val_idx)
  for toggle in filter(TOGGLE, instructions):
    output_buffer.WriteBits(TOGGLE)
    output_buffer.WriteValIdx(toggle.val_idx)
  for toggle in filter(TOGGLE_RANGE, instructions):
    output_buffer.WriteBits(TOGGLE_RANGE)
    output_buffer.WriteValIdx(toggle.start)
    output_buffer.WriteValIdx(toggle.last)
  for clone in filter(CLONE, instructions):
    output_buffer.WriteBits(CLONE)
    output_buffer.WriteKeyIdx(clone.key_idx)
    output_buffer.WriteString(huffman.encode(clone.val))
  for kvsto in filter(KVSTO, instructions):
    output_buffer.WriteString(huffman.encode(kvsto.key))
    output_buffer.WriteString(huffman.encode(kvsto.val))
  for eref in filter(EREF, instructions):
    output_buffer.WriteString(huffman.encode(kvsto.key))
    output_buffer.WriteString(huffman.encode(kvsto.val))

           ]]>
          </artwork>
        </figure>
      </section>

      <section title="Decompression-overview">
        <t> Note that this only describes conceptually how the algorithm works and is very much pseudo-code</t>
        <figure>
          <artwork><![CDATA[
  Decode():
    header_group = headers.header_group // as received from sender
    val_idx = headers_frame.first_val_idx // as received from sender.
    header_groups[header_group].ephemereal_headers.clear()
    for instruction in instructions:
      ExecuteInstruction(instruction, header_group)

    ExecuteInstruction(instruction, header_group):
      while (instruction.size_delta() + state_size > max_state_size):
        RemoveOldestLRUItem()
      if instruction.opcode == TOGGLE:
        if instruction.idx in header_groups[header_group]:
          header_groups[header_group].remove(instruction.idx)
        else:
          header_groups[header_group].insert(instruction.idx)
          MoveToHeadOfLRU(instruction.idx)
      elif instruction.opcode == TOGGLE_GROUP:
        for i=instruction.begin; i <= instruction.last; ++i:
          ExecuteInstruction(Toggle(i), header_group);
      elif instruction.opcode == CLONE:
        KeyIdxMap::iterator kimi = key_idx_map.lookup(instruction.key_idx)
        val_idx_map.insert(++val_idx, pair(kimi, instruction.val))
      elif instruction.opcode == KVSTO:
        key_idx = GetNextKeyIdx()
        KeyIdxMap::iterator kimi = key_idx_map.insert(key_idx,
                                                      instruction.key)
        val_idx_map.insert(++val_idx, pair(kimi, instruction.val))
      elif instruction.opcode == EREF:
        header_groups[header_group].ephemereal_headers.append(
            pair(instruction.key, instruction.val))


    // This would return the complete set of headers, e.g. for an HTTP
    // request or response
    
    InflateHeaders(header_group):
      Headers headers
      for idx in header_groups[header_group]:
        headers.append(ValIdxMap[idx].first->first, ValIdxMap[idx].second)
      headers.append(header_groups[header_group].ephemereal_headers);
      return headers
            ]]>
          </artwork>
        </figure>
      </section>

      <section title="On-the-wire-overview">
        <t> This would replace the current HEADERS frame described in the SPDY/2 draft specification.</t>
				<t> The header-group-id MUST be emitted immediately after the HEADERS frame boilerplate.</t>
        <t> A field indicating the start of the ID-sequence space MUST be immediately after the header-group-id</t>
        <t> A sequence of 'Toggle' or 'ToggleRange' operations MUST follow, if
          any of these operations are to be emitted at all</t>
        <t> A sequence of 'Clone' or 'KVSto' operations MUST then follow, if
          any of these operations are to be emitted at all</t>
        <t> A sequence of 'ERef' operations MUST then follow, if any of these are to be emitted.</t>
        <t> If the size of the aggregate set of instructions is larger than the
          frame-length, the 'finished' bit of the HEADERS frame boilerplate
          MUST be set to 0. This will indicate that the next subsequent HEADERS
          frame on that stream is a continuation of a previous HEADERS
          frame</t>
        <t> If all of the 'Toggle', 'ToggleRange', 'Clone', and 'KVSTo'
          operations have been emitted, the 'finished' bit of the HEADERS frame
          which encapsulates the last instruction MUST be set to 1.</t>
					<t> The rough ABNF for this would be:</t>
        <figure>
          <artwork><![CDATA[
HEADERS_FRAME: HEADER_FRAME_BOILERPLATE GROUP-ID FIRST-VAL-IDX INSTRUCTIONS
HEADER_FRAME_BOILERPLATE: LENGTH STREAM_ID HEADERS_OPCODE HEADER_DONE_FLAG
HEADER_DONE_FLAG: (TRUE | FALSE)
INSTRUCTIONS : *(TOGGLE | TOGGLERANGE) *( CLONE | KVSTO) *(EREF)
TOGGLE: TOGGLE_OPCODE VAL_IDX
TOGGLERANGE: TOGGLERANGE_OPCODE VAL_IDX_START VAL_IDX_END
CLONE:  CLONE_OPCODE KEY_IDX VAL_STRING
KVSTO:  KVSTO_OPCODE KEY_STRING VAL_STRING
EREF:   EREF_OPCODE KEY_STRING VAL_STRING

KEY_STRING: MAYBE_COMPRESSED_STRING
VAL_STRING: MAYBE_COMPRESSED_STRING
MAYBE_COMPRESSED_STRING:  TRUE HUFFMAN_COMPRESSED_STRING |
                         FALSE STRING_LITERAL
VAL_IDX_START: int
VAL_IDX_END: int
VAL_IDX: int
            ]]>
          </artwork>
        </figure>
				<t> A complete set of headers has only been expressed when the HEADERS frame has the HEADERS_DONE bit set to true. If the headers being serialized do not fit into one HEADERS frame, the HEADERS_DONE flag for all but the last HEADERS frame MUST be set to false</t>
				<t> There are alternate serializations which have various interesting properties. Here are some interesting variations among these alternatives:
          <list style="hanging" hangIndent="6">
					<t> Serialize all constant-length information before serializing any string data. This allows for a clear separation of byte-aligned vs. potentially non-byte-aligned data</t>
					<t> Have explicit sections for each opcode type, with a sentinel denoting the end of each range of opcode. Since the number of opcode types is small, this could reduce the overhead consumed in repeating the opcode.</t>
					<t> As above, but have a count for the number of each opcode instead of using a sentinel</t>
					<t> Use huffman coding on the entire payload of the HEADERS frame, not just the strings. TOGGLES are far more common than other operations, and using fewer bits for these could be beneficial</t>
					</list>
					</t>
      </section>

      <section title="Operations">
        <t> Toggle: This encodes a single ValIdx, which, if already present in the HeaderGroup associated with the current HEADERS frame, will be removed, else added. If added, then the LRU entry which the index references MUST be moved to the head of the LRU.</t>
        <t> ToggleRange: This incodes two ValIdx, denoting an inclusive range of ValIdx which shall be toggled as per the Toggle operation, above.</t>
        <t> Clone: This encodes a KeyIdx which references a pre-existing key, and references a new string. The Clone operation inserts a new ValEntry into the ValMap associated with the KeyMap entry identified by the KeyIdx. The ValIdx for this new ValEntry is the ++current_val_idx; The new ValEntry MUST be inserted at the head of the LRU.</t>
        <t> KVSto: This encodes a new key string and new value string. The key will be inserted into the KeyMap, and assigned a new KeyIdx. The value string will be inserted into the Valmap associated with the KeyMap entry that was just added. As with Clone, a new unique ValIdx will be associated with it in a monotonically increasing sequence by using the value of ++current_val_idx. The new ValEntry MUST be inserted at the head of the LRU.</t>
        <t> Eref: This encodes a key and value literal which, while interpreted as part of the header frame, doesn't cause any persistent change in the compressor state. ERef stands for "ephemereal reference".</t>
      </section>

        <section title="Security Considerations">
          <t> The compressor algorithm described here is expected to be immune to the current attacks against encrypted stream-based compressors such as TLS+gzip, but more scrutiny is warranted. The reason that it is believed that the algorithm(s) expressed here is immune is that any backreference to a header key or value always requires a whole-text match, and thus any probe of the compression context confirms no hypothesis unless the attacker has guessed the entire plaintext. </t>
        </section>

        <section title="Requirements Notation">
          <t>
            The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
            "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
            document are to be interpreted as described in <xref target="RFC2119"/>.
          </t>
        </section>

        <section title="Acknowledgements">
        </section>
      </middle>

      <back>
        <references title="Normative References">
          &RFC2119;
          &RFC2616;

          <reference anchor="SPDY" target="http://tools.ietf.org/html/draft-mbelshe-httpbis-spdy">
            <front>
              <title>SPDY PROTOCOL</title>
              <author initials="M" surname="Belshe" fullname="Mike Belshe">
                <organization>Twist</organization>
              </author>
              <author initials="R" surname="Peon" fullname="Roberto Peon">
                <organization>Google</organization>
              </author>
            </front>
          </reference>
        </references>
      </back>
    </rfc>


PAFTECH AB 2003-20262026-04-24 06:57:54