One document matched: draft-ietf-appsawg-mime-default-charset-04.xml


<?xml version="1.0" encoding="UTF-8"?>
<!--
    This XML document is the output of clean-for-DTD.xslt; a tool that strips
    extensions to RFC2629(bis) from documents for processing with xml2rfc.
--><!-- ?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ? -->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc tocindent="no" ?>
<?rfc autobreaks="no" ?>
<?rfc comments="yes" ?>
<?rfc inline="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes"?>
<!DOCTYPE rfc
  PUBLIC "" "rfc2629.dtd">
<rfc category="std" ipr="trust200902" updates="2046" docName="draft-ietf-appsawg-mime-default-charset-04">
     
    <front>
        <title abbrev="MIME Charset Default Update">Update to MIME regarding Charset Parameter Handling in Textual Media Types</title>
        <author initials="A." surname="Melnikov" fullname="Alexey Melnikov">
            <organization>Isode Limited</organization>
            <address>
              <postal>
                <street>5 Castle Business Village</street>
                <street>36 Station Road</street>
                <city>Hampton</city>
                <region>Middlesex</region>
                <code>TW12 2BX</code>
                <country>UK</country>
              </postal>
              <email>Alexey.Melnikov@isode.com</email>
            </address>
        </author>
        <author initials="J. F." surname="Reschke" fullname="Julian F. Reschke">
            <organization abbrev="greenbytes">greenbytes GmbH</organization>
            <address>
              <postal>
                <street>Hafenweg 16</street>
                <city>Muenster</city><region>NW</region><code>48155</code>
                <country>Germany</country>
              </postal>
              <email>julian.reschke@greenbytes.de</email>	
              <uri>http://greenbytes.de/tech/webdav/</uri>	
            </address>
        </author>
        <date year="2012" month="May" day="9"/>
        <area>Applications</area>
        <workgroup>Applications Area Working Group</workgroup>
        
        <keyword>MIME</keyword>
        <keyword>charset</keyword>
        <keyword>text</keyword>
        <abstract>
          <t>
            This document changes RFC 2046 rules regarding default charset parameter
	    values for text/* media types to better align with common usage by existing
	    clients and servers.
          </t>
        </abstract>
        <note title="Editorial Note (To be removed by RFC Editor)">
          <t>
            Discussion of this draft should take place on the Apps Area Working Group
            mailing list (apps-discuss@ietf.org), which is archived at
            <eref target="http://www.ietf.org/mail-archive/web/apps-discuss"/>.
          </t>
        </note>
    </front>

    <middle>
        <section title="Introduction and Overview">
            <t>

	    RFC 2046 specified that the default charset parameter
	    (i.e. the value used when the parameter is not specified) is "US-ASCII" (Section 4.1.2 of <xref target="RFC2046"/>).
	    RFC 2616 changed the default for use by HTTP (Hypertext Transfer Protocol) to be "ISO-8859-1" (Section 3.7.1 of <xref target="RFC2616"/>).
	    This encoding is not very common for new text/* media types
	    and a special rule in the HTTP specification adds confusion
	    about which specification (<xref target="RFC2046"/> or <xref target="RFC2616"/>)
	    is authoritative in regards to the default charset for text/* media types.
	    
	    <!-- jre recommends to raise an HTTPbis issue one feels strongly about this
      
      <cref>At the time of writing of this document the IETF HTTPBIS WG is working
	    on an update to RFC 2616 which removes the default charset of "ISO-8859-1"
	    for "text/*" media types. It is expected that the set of HTTPBIs documents
	    will reference this document in order to use the updated rules
	    of default charset in "text/*" media types.</cref> -->
            </t>
    
            <t>
	    Many complex text subtypes such as text/html <xref target="RFC2854"/>  and text/xml <xref target="RFC3023"/>  have internal
	    (to their format) means of describing the charset.
	    Many existing User Agents ignore the default of "US-ASCII" rule for at least
	    text/html and text/xml.
            </t>

	    <t>This document changes RFC 2046 rules regarding default charset parameter
	    values for text/* media types to better align with common usage by existing
	    clients and servers. It does not change the defaults for any currently
      registered media type.<!-- FIXME if we actually do change the default for text/plain-->
	    </t>
<!-- JR: we may also want to state that we do not define handling of broken messages-->
        </section>

	<section title="Conventions Used in This Document">
	    
	    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
	    this document are to be interpreted as described in
	    <xref target="RFC2119"/>.</t>
          
	</section>
	
	<section title="New rules for default charset parameter values for text/* media types">

	    <t>Section 4.1.2 of <xref target="RFC2046"/> says:</t>

      <t><list>
        <t>The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.</t>
      </list></t>

<!--///RFC 2046, Section 4.1.2 also says:
   Note that the character set used, if anything other than US- ASCII,
   must always be explicitly specified in the Content-Type field.
-->
	    
	    <t>As explained in the Introduction section this rule is considered
	    to be outdated, so this document replaces it with the following set
	    of rules:</t>

<!--///Ned wrote:
    In the absence of a specification of a default, I'm tempted
    to say the "default default" should be UTF-8.
-->

<!--///John Klensin wrote:
    Require text/* to be accompanied by a charset parameter
    always.  Of course, if one is omitted, which will happen
    in practice, people will assume the old rules.  But new
    and updated specs say "parameter MUST be supplied".
-->   
	    
	    <t>Each subtype of the "text" media type which uses the "charset"
	    parameter can define its own default value for the "charset" parameter,
	    including the absence of any default.
	    </t>
	    
	    <t>
<!-- jre I'm not sure I understand that "for interoperability..." part-->
<!--
Henri S.> For backwards compatibility, pretty much every existing text/* type
Henri S.> will have to violate this "SHOULD NOT".

Ned F.> Yep. That's the main reason why it needs to be a SHOULD.

JR: those who will update text/xml and text/html know how to read the SHOULD.
-->
	    In order to improve interoperability with deployed agents,
	    "text/*" media type registrations SHOULD either
      </t>
      <t>
      <list style="letters">
        <t>
          specify that the "charset" parameter is not used for the defined subtype,
    	    because the charset information is transported inside the payload (such as in "text/xml"), or
        </t>
        <t>
          require explicit unconditional inclusion of the "charset" parameter
    	    eliminating the need for a default value.
        </t>
      </list>
      </t>
      <t>
<!--////Alexey: Hmmm, does this mean that the second choice above doesn't specify a default either?-->

	    In accordance with option (a), above, registrations for "text/*" media types that can
	    transport charset information inside the corresponding payloads (such
	    as "text/html" and "text/xml") SHOULD NOT specify
	    the use of a "charset" parameter, nor any default value, in order to
	    avoid conflicting interpretations should the charset parameter value
	    and the value specified in the payload disagree.</t>
	    
	    <t>
	    New subtypes of the "text" media type, thus, SHOULD NOT define a
	    default "charset" value.  If there is a strong reason to do so
	    despite this advice, they SHOULD use the "UTF-8" <xref target="RFC3629"/> charset
	    as the default.
	    </t>
      
      <t>
      Regardless of what approach is chosen, all new text/* registrations MUST
      clearly specify how the charset is determined; relying on the default
      defined in Section 4.1.2 of <xref target="RFC2046"/> is no longer
      permitted. However, existing text/* registrations that fail to specify
      how the charset is determined still default to US-ASCII. 
      </t>
    
	    <t>
	    Specifications covering the "charset" parameter, and what
	    default value, if any, is used, are subtype-specific, NOT
      protocol-specific.  Protocols that use MIME, therefore, MUST NOT
      override default charset values for "text/*" media types to be different
      for their specific protocol.  The protocol definitions MUST leave that
	    to the subtype definitions.
	    </t>
	    
        </section>

	<section title="Default charset parameter value for text/plain media type">

	    <t>The default charset parameter value for text/plain is unchanged
	    from <xref target="RFC2046"/> and remains as "US-ASCII".</t>
	    
        </section>
		<section anchor="security" title="Security Considerations">
	    
          <t>
            Guessing of the charset parameter can lead to security issues
            such as content buffer overflows, denial of services or bypass
            of filtering mechanisms. However, this document does not
            promote guessing, but encourages use of charset information
            that is specified by the sender.
          </t>
          <t>
            Conflicting information in-band vs out-of-band can also lead to
            similar security problems, and this document recommends the use
            of charset information which is more likely to be correct (for
            example, in-band over out-of-band). 
          </t>

	</section>

        <section anchor="iana" title="IANA Considerations">

            <t>
              This document asks IANA to update the "text" subregistry of the Media
              Types registry (<eref target="http://www.iana.org/assignments/media-types/text/"/>), to add the following preamble:
              "See [this RFC] for information about 'charset' parameter handling for text media types."
            </t>
            <t>
              IANA is also asked to add this RFC to the list of references at the beginning of the Application for Media Type (<eref target="http://www.iana.org/cgi-bin/mediatypes.pl"/>).
            </t>
        </section>
    </middle>

    <back>
        <references title="Normative References">

	    

<reference anchor="RFC2119">

<front>
<title abbrev="RFC Key Words">Key words for use in RFCs to Indicate Requirement Levels</title>
<author initials="S." surname="Bradner" fullname="Scott Bradner">
<organization>Harvard University</organization>
<address>
<postal>
<street>1350 Mass. Ave.</street>
<street>Cambridge</street>
<street>MA 02138</street></postal>
<phone>- +1 617 495 3864</phone>
<email>sob@harvard.edu</email></address></author>
<date year="1997" month="March"/>
<area>General</area>
<keyword>keyword</keyword>
<abstract>
<t>
   In many standards track documents several words are used to signify
   the requirements in the specification.  These words are often
   capitalized.  This document defines these words as they should be
   interpreted in IETF documents.  Authors who follow these guidelines
   should incorporate this phrase near the beginning of their document:

<list>
<t>
      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in
      RFC 2119.
</t></list></t>
<t>
   Note that the force of these words is modified by the requirement
   level of the document in which they are used.
</t></abstract></front>

<seriesInfo name="BCP" value="14"/>
<seriesInfo name="RFC" value="2119"/>
<format type="TXT" octets="4723" target="http://www.rfc-editor.org/rfc/rfc2119.txt"/>
<format type="HTML" octets="17491" target="http://xml.resource.org/public/rfc/html/rfc2119.html"/>
<format type="XML" octets="5777" target="http://xml.resource.org/public/rfc/xml/rfc2119.xml"/>
</reference>

	    

<reference anchor="RFC2046">

<front>
<title abbrev="Media Types">Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types</title>
<author initials="N." surname="Freed" fullname="Ned Freed">
<organization>Innosoft International, Inc.</organization>
<address>
<postal>
<street>1050 East Garvey Avenue South</street>
<city>West Covina</city>
<region>CA</region>
<code>91790</code>
<country>US</country></postal>
<phone>+1 818 919 3600</phone>
<facsimile>+1 818 919 3614</facsimile>
<email>ned@innosoft.com</email></address></author>
<author initials="N." surname="Borenstein" fullname="Nathaniel S. Borenstein">
<organization>First Virtual Holdings</organization>
<address>
<postal>
<street>25 Washington Avenue</street>
<city>Morristown</city>
<region>NJ</region>
<code>07960</code>
<country>US</country></postal>
<phone>+1 201 540 8967</phone>
<facsimile>+1 201 993 3032</facsimile>
<email>nsb@nsb.fv.com</email></address></author>
<date year="1996" month="November"/>
<abstract>
<t>STD 11, RFC 822 defines a message representation protocol specifying considerable detail about US-ASCII message headers, but which leaves the message content, or message body, as flat US-ASCII text.  This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages to allow for</t>
<t>(1)   textual message bodies in character sets other than US-ASCII,</t>
<t>(2)   an extensible set of different formats for non-textual message bodies,</t>
<t>(3)   multi-part message bodies, and</t>
<t>(4)   textual header information in character sets other than US-ASCII.</t>
<t>These documents are based on earlier work documented in RFC 934, STD 11 and RFC 1049, but extends and revises them.  Because RFC 822 said so little about message bodies, these documents are largely orthogonal to (rather than a revision of) RFC 822.</t>
<t>The initial document in this set, RFC 2045, specifies the various headers used to describe the structure of MIME messages. This second document defines the general structure of the MIME media typing sytem and defines an initial set of media types. The third document, RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text data in Internet mail header fields. The fourth document, RFC 2048, specifies various IANA registration procedures for MIME-related facilities.  The fifth and final document, RFC 2049, describes MIME
   conformance criteria as well as providing some illustrative examples of MIME message formats, acknowledgements, and the bibliography.</t>
<t>These documents are revisions of RFCs 1521 and 1522, which themselves were revisions of RFCs 1341 and 1342.  An appendix in RFC 2049 describes differences and changes from previous versions.</t></abstract></front>

<seriesInfo name="RFC" value="2046"/>
<format type="TXT" octets="105854" target="http://www.rfc-editor.org/rfc/rfc2046.txt"/>
</reference>

	    

<reference anchor="RFC3629">

<front>
<title>UTF-8, a transformation format of ISO 10646</title>
<author initials="F." surname="Yergeau" fullname="F. Yergeau">
<organization/></author>
<date year="2003" month="November"/>
<abstract>
<t>ISO/IEC 10646-1 defines a large character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems.  The originally proposed encodings of the UCS, however, were not compatible with many current applications and protocols, and this has led to the development of UTF-8, the object of this memo.  UTF-8 has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values.  This memo obsoletes and replaces RFC 2279.</t></abstract></front>

<seriesInfo name="STD" value="63"/>
<seriesInfo name="RFC" value="3629"/>
<format type="TXT" octets="33856" target="http://www.rfc-editor.org/rfc/rfc3629.txt"/>
</reference>


        </references>

        <references title="Informative References">

	    

<reference anchor="RFC2616">

<front>
<title abbrev="HTTP/1.1">Hypertext Transfer Protocol -- HTTP/1.1</title>
<author initials="R." surname="Fielding" fullname="Roy T. Fielding">
<organization abbrev="UC Irvine">Department of Information and Computer Science</organization>
<address>
<postal>
<street>University of California, Irvine</street>
<city>Irvine</city>
<region>CA</region>
<code>92697-3425</code></postal>
<facsimile>+1(949)824-1715</facsimile>
<email>fielding@ics.uci.edu</email></address></author>
<author initials="J." surname="Gettys" fullname="James Gettys">
<organization abbrev="Compaq/W3C">World Wide Web Consortium</organization>
<address>
<postal>
<street>MIT Laboratory for Computer Science, NE43-356</street>
<street>545 Technology Square</street>
<city>Cambridge</city>
<region>MA</region>
<code>02139</code></postal>
<facsimile>+1(617)258-8682</facsimile>
<email>jg@w3.org</email></address></author>
<author initials="J." surname="Mogul" fullname="Jeffrey C. Mogul">
<organization abbrev="Compaq">Compaq Computer Corporation</organization>
<address>
<postal>
<street>Western Research Laboratory</street>
<street>250 University Avenue</street>
<city>Palo Alto</city>
<region>CA</region>
<code>94305</code></postal>
<email>mogul@wrl.dec.com</email></address></author>
<author initials="H." surname="Frystyk" fullname="Henrik Frystyk Nielsen">
<organization abbrev="W3C/MIT">World Wide Web Consortium</organization>
<address>
<postal>
<street>MIT Laboratory for Computer Science, NE43-356</street>
<street>545 Technology Square</street>
<city>Cambridge</city>
<region>MA</region>
<code>02139</code></postal>
<facsimile>+1(617)258-8682</facsimile>
<email>frystyk@w3.org</email></address></author>
<author initials="L." surname="Masinter" fullname="Larry Masinter">
<organization abbrev="Xerox">Xerox Corporation</organization>
<address>
<postal>
<street>MIT Laboratory for Computer Science, NE43-356</street>
<street>3333 Coyote Hill Road</street>
<city>Palo Alto</city>
<region>CA</region>
<code>94034</code></postal>
<email>masinter@parc.xerox.com</email></address></author>
<author initials="P." surname="Leach" fullname="Paul J. Leach">
<organization abbrev="Microsoft">Microsoft Corporation</organization>
<address>
<postal>
<street>1 Microsoft Way</street>
<city>Redmond</city>
<region>WA</region>
<code>98052</code></postal>
<email>paulle@microsoft.com</email></address></author>
<author initials="T." surname="Berners-Lee" fullname="Tim Berners-Lee">
<organization abbrev="W3C/MIT">World Wide Web Consortium</organization>
<address>
<postal>
<street>MIT Laboratory for Computer Science, NE43-356</street>
<street>545 Technology Square</street>
<city>Cambridge</city>
<region>MA</region>
<code>02139</code></postal>
<facsimile>+1(617)258-8682</facsimile>
<email>timbl@w3.org</email></address></author>
<date year="1999" month="June"/>
<abstract>
<t>
   The Hypertext Transfer Protocol (HTTP) is an application-level
   protocol for distributed, collaborative, hypermedia information
   systems. It is a generic, stateless, protocol which can be used for
   many tasks beyond its use for hypertext, such as name servers and
   distributed object management systems, through extension of its
   request methods, error codes and headers . A feature of HTTP is
   the typing and negotiation of data representation, allowing systems
   to be built independently of the data being transferred.
</t>
<t>
   HTTP has been in use by the World-Wide Web global information
   initiative since 1990. This specification defines the protocol
   referred to as "HTTP/1.1", and is an update to RFC 2068 .
</t></abstract></front>

<seriesInfo name="RFC" value="2616"/>
<format type="TXT" octets="422317" target="http://www.rfc-editor.org/rfc/rfc2616.txt"/>
<format type="PS" octets="5529857" target="http://www.rfc-editor.org/rfc/rfc2616.ps"/>
<format type="PDF" octets="550558" target="http://www.rfc-editor.org/rfc/rfc2616.pdf"/>
<format type="HTML" octets="636125" target="http://xml.resource.org/public/rfc/html/rfc2616.html"/>
<format type="XML" octets="493420" target="http://xml.resource.org/public/rfc/xml/rfc2616.xml"/>
</reference>

	    

<reference anchor="RFC2854">

<front>
<title>The 'text/html' Media Type</title>
<author initials="D." surname="Connolly" fullname="D. Connolly">
<organization/></author>
<author initials="L." surname="Masinter" fullname="L. Masinter">
<organization/></author>
<date year="2000" month="June"/>
<abstract>
<t>This document summarizes the history of HTML development, and defines the "text/html" MIME type by pointing to the relevant W3C recommendations.  This memo provides information for the Internet community.</t></abstract></front>

<seriesInfo name="RFC" value="2854"/>
<format type="TXT" octets="16135" target="http://www.rfc-editor.org/rfc/rfc2854.txt"/>
</reference>

	    

<reference anchor="RFC3023">

<front>
<title>XML Media Types</title>
<author initials="M." surname="Murata" fullname="M. Murata">
<organization/></author>
<author initials="S." surname="St. Laurent" fullname="S. St. Laurent">
<organization/></author>
<author initials="D." surname="Kohn" fullname="D. Kohn">
<organization/></author>
<date year="2001" month="January"/>
<abstract>
<t>This document standardizes five new media types -- text/xml, application/xml, text/xml-external-parsed-entity, application/xml- external-parsed-entity, and application/xml-dtd -- for use in exchanging network entities that are related to the Extensible Markup Language (XML).  This document also standardizes a convention (using the suffix '+xml') for naming media types outside of these five types when those media types represent XML MIME (Multipurpose Internet Mail Extensions) entities. [STANDARDS-TRACK]</t></abstract></front>

<seriesInfo name="RFC" value="3023"/>
<format type="TXT" octets="86011" target="http://www.rfc-editor.org/rfc/rfc3023.txt"/>
</reference>

	    
        </references>
    
    <section title="Acknowledgements">
	
      <t>
	Many thanks to Ned Freed and John Klensin for comments and ideas that motivated
	creation of this document, and to Carsten Bormann, Murray S. Kucherawy, Barry Leiba, and Henri Sivonen for feedback and text suggestions.
      </t>

    </section>
	
    </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 04:54:55