One document matched: draft-ietf-json-text-sequence-01.xml


<?xml version="1.0" encoding="UTF-8"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc tocindent="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc tocindent="no"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc5234 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5234.xml">
<!ENTITY rfc7159 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7159.xml">
]>
<rfc docName="draft-ietf-json-text-sequence-01" ipr="trust200902" category="std">
  <front>
    <title abbrev="JSON Text Sequences">JavaScript Object Notation (JSON) Text Sequences</title>
    <author initials="N." surname="Williams" fullname="Nicolas Williams">
      <organization abbrev="Cryptonector">Cryptonector, LLC</organization>
      <address>
        <email>nico@cryptonector.com</email>
      </address>
    </author>
    <date month="May" year="2014"/>
    <area>
Apps Area
</area>
    <workgroup>
json
</workgroup>
    <keyword>Internet-Draft</keyword>
    <abstract>
      <t>
This document describes the JSON text sequence format and associated media type.</t>
    </abstract>
  </front>
  <middle>
    <section title="Introduction and Motivation" anchor="d1e193">
      <t>
The JavaScript Object Notation (JSON) <xref target="RFC7159"/> is a very handy serialization format. However, when serializing a large sequence of values as an array, or a possibly indeterminate-length or never-ending sequence of values, JSON becomes difficult to work with.</t>
      <t>
Consider a sequence of one million values, each possibly 1 kilobyte when encoded, which would be roughly one gigabyte. If processing such a dataset requires first parsing it entirely, then the result is very inefficient and the processing will be limited by virtual memory. “Online” (a.k.a., “streaming”) parsers help, but they are neither widely available or widely used, nor are they easy to use.</t>
      <t>
Ideally such datasets could be parsed and processed one element at a time. Even if each element must be parsed in a not-online manner due to local choice of parser, the result will usually be sufficiently online: limited by the size of the biggest element in the sequence rather than by the size of the sequence.</t>
      <t>
This document describes the concept and format of “JSON text sequences”, which are specifically not JSON texts themselves but are composed of JSON texts.</t>
      <section title="Conventions used in this document" anchor="d1e217">
        <t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in <xref target="RFC2119"/>.</t>
      </section>
    </section>
    <section title="JSON Text Sequence Format" anchor="sec_JSON_Text_Sequence">
      <t>
The ABNF <xref target="RFC5234"/> for the JSON text sequence format is as follows:</t>
      <t>
</t>
      <t>
        <figure anchor="magicparlabel-108" title="JSON text sequence ABNF">
          <artwork>  JSON-sequence = *whitespace *(JSON-text 1*whitespace)
  whitespace = %x20 / %x09 / %x0A / %x0D
  JSON-text = <given by RFC7159></artwork>
        </figure>
      </t>
      <t>
A JSON text sequence is a sequence of zero or more JSON texts, each followed by JSON whitespace separator.</t>
      <t>
Requirements:</t>
      <t>
        <list style="symbols">
          <t>
JSON text sequence encoders MUST emit one or more JSON whitespace separator characters immediately after any JSON text.</t>
          <t>
JSON text sequence parsers MUST NOT interpret any sequence of two or more contiguous whitespace as a sequence of empty JSON texts. Two contiguous separators do not denote an empty JSON text between them as there is no such thing as an empty JSON text.</t>
        </list>
      </t>
      <t>
An input of 'truefalse' is not a valid sequence of two JSON values, true and false! Neither is 'true0' a valid sequence of true and zero. Some existing JSON parsers that might be used to construct sequence parsers might in fact accept such sequences, resulting in erroneous parsing of sequences of two or more numbers. E.g., a sequence of two numbers, 4 and 2, encoded without the required whitespace between them would parse incorrectly as the number 42. This ambiguity is resolved by requiring that encoders never omit the separator.</t>
    </section>
    <section title="Use for Logfiles, or How to Resynchronize Following Truncated entries" anchor="d1e288">
      <t>
The JSON Text Sequence format is perfect for use with logfiles, as those are generally (and atomically) appended to on an ongoing basis. I.e., logfiles are of indeterminate length, at least right up until they closed.</t>
      <t>
A problem comes up with this use case: it is difficult to guarantee that append writes will complete. Therefore it's possible (if unlikely) to end up with truncated log entries -which may fail to parse as JSON texts- followed by other entries. The mechanics of such failures are not explained here (consider power failures though).</t>
      <t>
Fortunately, as long as all texts in the logfile sequence are followed by a newline, it is possible to detect a subsequent entry written after an entry that fails to parse. Figure 2 shows an ABNF rule for detecting the boundary between a non-truncated [and some truncated] JSON text and the next JSON text in a sequence.</t>
      <t>
</t>
      <t>
        <figure anchor="magicparlabel-133" title="ABNF for resynchronization">
          <artwork> boundary = endchar *whitespace NL *whitespace startchar
 endchar = ( "}" / "]" / %x22 / "e" / "l" / DIGIT )
 startchar =  ( "{" / "[" / %x22 / "t" / "f" / "n" / "-" / DIGIT )</artwork>
        </figure>
      </t>
      <t>
To resynchronize after failing to parse a JSON text, simply search for a boundary as described in figure 2. A boundary found this way might be the boundary between the truncated entry and the subsequent entry, or it might be a subsequent boundary.</t>
      <t>
Applications SHOULD scan backwards (up to the start of the incomplete text) from such a boundary looking for a newline followed by a valid JSON text, otherwise valid entries following truncated entries can be missed by this rule.</t>
      <t>
Note that in order to enable resynchronization all JSON texts appended to a logfile MUST be followed by a newline.</t>
    </section>
    <section title="Security Considerations" anchor="sec_Security_Considerations">
      <t>
All the security considerations of JSON <xref target="RFC7159"/> apply.</t>
      <t>
There is no end of sequence indicator. This means that “end of file”, “end of transmission”, and so on, can be indistinguishable form a logical end of sequence. Applications where this matters should denote end of sequence by convention (e.g., Content-Length in HTTP).</t>
      <t>
JSON text sequence parsers based on non-incremental, non-online JSON text parsers will not be able to efficiently parser JSON texts in which newlines appear; attempting to parse such sequences with non-incremental, non-online JSON text parsers creates a compute resource exhaustion vulnerability.</t>
      <t>
The first requirement given in  <xref target="sec_JSON_Text_Sequence"/> (otherwise-ambiguous JSON texts must be separated by whitespace) is critical and must be adhered to. It is best to always emit a whitespace separator after every JSON text emitted.</t>
      <t>
Purposefully appending a truncated (or invalid) JSON text to a JSON text sequence logfile can cause the subsequent entry to be ignored by tooling that does not scan backwards from resynchronization boundaries looking for otherwise missed complete JSON texts.</t>
    </section>
    <section title="IANA Considerations" anchor="sec_IANA_Considerations">
      <t>
The MIME media type for JSON text sequences is application/json-seq.</t>
      <t>
Type name: application</t>
      <t>
Subtype name: json-seq</t>
      <t>
Required parameters: n/a</t>
      <t>
Optional parameters: n/a</t>
      <t>
Encoding considerations: binary</t>
      <t>
Security considerations: See <this document, once published>,  <xref target="sec_Security_Considerations"/>.</t>
      <t>
Interoperability considerations: Described herein.</t>
      <t>
Published specification: <this document, once published>.</t>
      <t>
Applications that use this media type: JSON text sequences have been used in applications written with the jq programming language.</t>
    </section>
    <section title="Acknowledgements" anchor="d1e448">
      <t>
Phillip Hallam-Baker proposed the use of JSON text sequences for logfiles and pointed out the need for resynchronization. James Manger contributed the ABNF for resynchronization.</t>
    </section>
  </middle>
  <back>
    <references title="Normative References">&rfc2119;
&rfc5234;
&rfc7159;
</references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 01:06:22