One document matched: draft-ietf-json-text-sequence-12.xml


<?xml version="1.0" encoding="UTF-8"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc tocindent="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc tocindent="no"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc0020 PUBLIC "" "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.0020.xml">
<!ENTITY rfc3629 PUBLIC "" "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3629.xml">
<!ENTITY rfc5234 PUBLIC "" "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5234.xml">
<!ENTITY rfc7159 PUBLIC "" "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7159.xml">
]>
<rfc docName="draft-ietf-json-text-sequence-12" ipr="trust200902" category="std">
  <front>
    <title abbrev="JSON Text Sequences">JavaScript Object Notation (JSON) Text Sequences</title>
    <author initials="N." surname="Williams" fullname="Nicolas Williams">
      <organization abbrev="Cryptonector">Cryptonector, LLC</organization>
      <address>
        <email>nico@cryptonector.com</email>
      </address>
    </author>
    <date month="December" year="2014"/>
    <area>
Apps Area
</area>
    <workgroup>
json
</workgroup>
    <keyword>Internet-Draft</keyword>
    <abstract>
      <t>
This document describes the JSON text sequence format and associated media type, “application/json-seq”. A JSON text sequence consists of any number of JSON texts, all encoded in UTF-8, each prefixed by an ASCII Record Separator (0x1E), and each ending with an ASCII Line Feed character (0x1A).</t>
    </abstract>
  </front>
  <middle>
    <section title="Introduction and Motivation" anchor="d1e215">
      <t>
The JavaScript Object Notation (JSON) <xref target="RFC7159"/> is a very handy serialization format. However, when serializing a large sequence of values as an array, or a possibly indeterminate-length or never-ending sequence of values, JSON becomes difficult to work with.</t>
      <t>
Consider a sequence of one million values, each possibly 1 kilobyte when encoded -- roughly one gigabyte. It is often desirable to process such a dataset in an incremental manner: without having to first read all of it before beginning to produce results. Traditionally the way to do this with JSON is to use a “streaming” parser, but these are neither widely available, widely used, nor easy to use.</t>
      <t>
This document describes the concept and format of “JSON text sequences”, which are specifically not JSON texts themselves but are composed of (possible) JSON texts. JSON text sequences can be parsed (and produced) incrementally without having to have a streaming parser (nor streaming encoder).</t>
      <section title="Conventions used in this document" anchor="d1e236">
        <t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “NOT RECOMMENDED”, "MAY", and "OPTIONAL" in this document are to be interpreted as described in <xref target="RFC2119"/>.</t>
      </section>
    </section>
    <section title="JSON Text Sequence Format" anchor="sec_JSON_Text_Sequence">
      <t>
Two different sets of ABNF rules are provided for the definition of JSON text sequences: one for parsers, and one for encoders. Having two different sets of rules permits recovery by parsers from sequences where some the elements are truncated for whatever reason. The syntax for parsers is specified in terms of octet strings which are then interpreted as JSON texts if possible. The syntax for encoders, on the other hand, assumes that sequence elements are not truncated.</t>
      <t>
JSON text sequences MUST use UTF-8 encoding; other encodings of JSON (i.e., UTF-16 and UTF-32) MUST NOT be used.</t>
      <section title="JSON text sequence parsing" anchor="sub_ParsingRules">
        <t>
The ABNF <xref target="RFC5234"/> for the JSON text sequence parser is as given in  <xref target="fig_ABNF_parser"/>.</t>
        <t>
</t>
        <t>
          <figure anchor="fig_ABNF_parser" title="JSON text sequence ABNF">
            <artwork>  JSON-sequence = *(1*RS possible-JSON)
  RS = %x1E; "record separator" (RS), see RFC20
           ; Also known as: Unicode Character 'INFORMATION SEPARATOR
           ;                TWO' (U+001E)
  possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
                            ; JSON text (see RFC7159)
  not-RS = %x00-1d / %x1f-ff; any octets other than RS</artwork>
          </figure>
        </t>
        <t>
In prose: a series of octet strings, each containing any octet other than a record separator (RS) (0x1E) [RFC0020], all octet strings separated from each other by RS octets.  Each octet string in the sequence is to be parsed as a JSON text in the UTF-8 encoding [RFC3629]. </t>
        <t>
If parsing of such an octet string as a UTF-8-encoded JSON text fails, the parser SHOULD nonetheless continue parsing the remainder of the sequence. The parser can report such failures to applications (which might then choose to terminate parsing of a sequence). Multiple consecutive RS octets do not denote empty sequence elements between them, and can be ignored.</t>
        <t>
This document does not define a mechanism for reliably identifying text sequence by position (for example, when sending individual elements of an array as unique text sequences). For applications where truncation is a possibility, this means that intended sequence elements can be truncated, and can even be missing entirely, therefore a reference to an nth element would be unreliable.</t>
        <t>
There is no end of sequence indicator.</t>
      </section>
      <section title="JSON text sequence encoding" anchor="sub_EncodingRules">
        <t>
The ABNF for the JSON text sequence encoder is given in  <xref target="fig_ABNF_encoder"/>.</t>
        <t>
</t>
        <t>
          <figure anchor="fig_ABNF_encoder" title="JSON text sequence ABNF">
            <artwork>  JSON-sequence = *(RS JSON-text LF)
  RS = %x1E; see RFC20
           ; Also known as: Unicode Character 'INFORMATION SEPARATOR
           ;                TWO' (U+001E)
  LF = %x0A; "line feed" (LF), see RFC20
  JSON-text = <given by RFC7159, using UTF-8 encoding></artwork>
          </figure>
        </t>
        <t>
In prose: any number of JSON texts, each encoded in UTF-8 <xref target="RFC3629"/>, each preceded by one ASCII RS character, and each followed by a line feed (LF). Since RS is an ASCII control character it may only appear in JSON strings in escaped form (see <xref target="RFC7159"/>), and since RS may not appear in JSON texts in any other form, RS unambiguously delimits the start of any element in the sequence. RS is sufficient to unambiguously delimit all top-level JSON value types other than numbers. Following each JSON text in the sequence with an LF allows detection of truncated JSON texts consisting of a number at the top-level; see  <xref target="sub_Top_level"/>.</t>
        <t>
JSON text sequence encoders are expected to ensure that the sequence elements are properly formed. When the JSON text sequence encoder does the JSON text encoding, the sequence elements will naturally be properly formed. When the JSON text sequence encoder accepts already-encoded JSON texts, the JSON text sequence encoder ought to to parse them before adding them to a sequence.</t>
        <t>
Note that on some systems it's possible to input RS by typing 'ctrl-^'; on some system or applications the correct sequence may be 'ctrl-v crtl-^'. This is helpful when constructing a sequence manually with a text editor.</t>
      </section>
      <section title="Incomplete/invalid JSON texts need not be fatal" anchor="d1e385">
        <t>
Per- <xref target="sub_ParsingRules"/>, JSON text sequence parsers should not abort when an octet string contains a malformed JSON text, instead the JSON text sequence parser should skip to the next RS. Such a situation may arise in contexts where, for example, append-writes to log files are truncated by the filesystem (e.g., due to a crash, or administrative process termination).</t>
        <t>
Incremental JSON text parsers may be used, though of course failure to parse a given text may result after first producing some incremental parse results.</t>
        <t>
Sequence parsers should have an option to warn about truncated JSON texts.</t>
      </section>
      <section title="Top-level numeric, 'true', 'false', and 'null' values" anchor="sub_Top_level">
        <t>
While objects, arrays, and strings are self-delimited in JSON texts, numbers, and the values 'true', 'false', and 'null' are not. Only whitespace can delimit the latter four kinds of values.</t>
        <t>
JSON text sequences use 0x0A as a "canary" octet to detect truncation.</t>
        <t>
Parsers MUST check that any JSON texts that are a top-level number, or which might be 'true', 'false', or 'null' include JSON whitespace (at least one byte matching the “ws” ABNF rule from <xref target="RFC7159"/>) after that value, otherwise the JSON-text may have been truncated. Note that the LF following each JSON text matches the “ws” ABNF rule.</t>
        <t>
Parsers MUST drop JSON-text sequence elements consisting of non-self-delimited top-level values that may have been truncated (that are not delimited by whitespace). Parsers can report such texts as warnings (including, optionally, the parsed text and/or the original octet string).</t>
        <t>
For example, '<RS>123<RS>' might have been intended to carry the top-level number 1234, but must have been truncated. Similarly, '<RS>true<RS>' might have been intended to carry the invalid text 'trueish'. '<RS>truefalse<RS>' is not two top-level values, 'true', and 'false'; it is simply not a valid JSON text.</t>
        <t>
Implementations may produce a value when parsing '<RS>”foo”<RS>' because their JSON text parser might be able to consume bytes incrementally, and since the JSON text in this case is a self-delimiting top-level value, the parser can produce the result without consuming an additional byte. Such implementations ought to skip to the next RS byte, possibly reporting any intervening non-whitespace bytes.</t>
      </section>
    </section>
    <section title="Security Considerations" anchor="sec_Security_Considerations">
      <t>
All the security considerations of JSON <xref target="RFC7159"/> apply. This format provides no cryptographic integrity protection of any kind.</t>
      <t>
As usual, parsers must operate on as-good-as untrusted input. This means that parsers must fail gracefully in the face of malicious inputs.</t>
      <t>
Note that incremental JSON text parsers can produce partial results and later indicate failure to parse the remainder of a text. A sequence parser that uses an incremental JSON text parser might treat a sequence like '<RS>”foo”<LF>456<LF><RS>' as a sequence of one element (“foo”), while a sequence parser that uses a non-incremental JSON text parser might treat the same sequence as being empty. This effect, and texts that fail to parse and are ignored can be used to smuggle data past sequence parsers that don't warn about JSON text failures.</t>
      <t>
Repeated parsing and re-encoding of a JSON text sequence can result in the addition (or stripping) of trailing LF bytes from (to) individual sequence element JSON texts. This can break signature validation. JSON has no canonical form for JSON texts, therefore neither does the JSON text sequence format.</t>
    </section>
    <section title="IANA Considerations" anchor="sec_IANA_Considerations">
      <t>
The MIME media type for JSON text sequences is application/json-seq.</t>
      <t>
Type name: application</t>
      <t>
Subtype name: json-seq</t>
      <t>
Required parameters: N/A</t>
      <t>
Optional parameters: N/A</t>
      <t>
Encoding considerations: binary</t>
      <t>
Security considerations: See <this document, once published>,  <xref target="sec_Security_Considerations"/>.</t>
      <t>
Interoperability considerations: Described herein.</t>
      <t>
Published specification: <this document, once published>.</t>
      <t>
Applications that use this media type: <by publication time <eref target="https://stedolan.github.io/jq"/> is likely to support this format>.</t>
      <t>
Fragment identifier considerations: N/A.</t>
      <t>
Additional information:</t>
      <t>
        <list style="symbols">
          <t>
Deprecated alias names for this type: N/A.</t>
          <t>
Magic number(s): N/A</t>
          <t>
File extension(s): N/A.</t>
          <t>
Macintosh file type code(s): N/A.</t>
          <t>
Person & email address to contact for further information:

<list style="symbols"><t>
json@ietf.org</t></list>
</t>
          <t>
Intended usage: COMMON</t>
          <t>
Author: See the "Authors' Addresses" section of this document.</t>
          <t>
Change controller: IETF</t>
        </list>
      </t>
    </section>
    <section title="Acknowledgements" anchor="d1e602">
      <t>
Phillip Hallam-Baker proposed the use of JSON text sequences for logfiles and pointed out the need for resynchronization. Stephen Dolan created <eref target="https://github.com/stedolan/jq"/>, which uses something like JSON text sequences (with LF as the separator between texts on output, and requiring only such whitespace as needed to disambiguate on input). Carsten Bormann suggested the use of ASCII RS, and Joe Hildebrand suggested the use of LF in addition to RS for disambiguating top-level number values. Paul Hoffman shepherded the Internet-Draft. Many others contributed reviews and comments on the JSON Working Group mailing list.</t>
    </section>
  </middle>
  <back>
    <references title="Normative References">&rfc2119;
&rfc0020;
&rfc3629;
&rfc5234;
&rfc7159;
</references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 01:06:18