One document matched: draft-ietf-appsawg-mime-default-charset-00.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--
This XML document is the output of clean-for-DTD.xslt; a tool that strips
extensions to RFC2629(bis) from documents for processing with xml2rfc.
--><!-- ?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ? -->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc tocindent="no" ?>
<?rfc autobreaks="no" ?>
<?rfc comments="yes" ?>
<?rfc inline="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes"?>
<!DOCTYPE rfc
PUBLIC "" "rfc2629.dtd">
<rfc category="std" ipr="trust200902" updates="2046" docName="draft-ietf-appsawg-mime-default-charset-00">
<front>
<title abbrev="MIME Charset Default Update">Update to MIME regarding Charset Parameter Handling in Textual Media Types</title>
<author initials="A." surname="Melnikov" fullname="Alexey Melnikov">
<organization>Isode Limited</organization>
<address>
<postal>
<street>5 Castle Business Village</street>
<street>36 Station Road</street>
<city>Hampton</city>
<region>Middlesex</region>
<code>TW12 2BX</code>
<country>UK</country>
</postal>
<email>Alexey.Melnikov@isode.com</email>
</address>
</author>
<author initials="J. F." surname="Reschke" fullname="Julian F. Reschke">
<organization abbrev="greenbytes">greenbytes GmbH</organization>
<address>
<postal>
<street>Hafenweg 16</street>
<city>Muenster</city><region>NW</region><code>48155</code>
<country>Germany</country>
</postal>
<email>julian.reschke@greenbytes.de</email>
<uri>http://greenbytes.de/tech/webdav/</uri>
</address>
</author>
<date year="2012" month="February" day="1"/>
<area>Applications</area>
<workgroup>Applications Area Working Group</workgroup>
<keyword>MIME</keyword>
<keyword>charset</keyword>
<keyword>text</keyword>
<abstract>
<t>
This document changes RFC 2046 rules regarding default charset parameter
values for text/* media types to better align with common usage by existing
clients and servers.
</t>
</abstract>
</front>
<middle>
<section title="Introduction and Overview">
<t>
<!--////Alexey: this might need improvments-->
<xref target="RFC2046"/> specified that the default charset parameter
(i.e. the value used when it is not specified) is "US-ASCII".
<xref target="RFC2616"/> changed the default for use by HTTP to be "ISO-8859-1".
This encoding is not very common for new text/* media types
and a special rule in HTTP adds confusion
about which specification (<xref target="RFC2046"/> or <xref target="RFC2616"/>)
is authoritative in regards to the default charset for text/* media types.
<cref>At the time of writing of this document the IETF HTTPBIS WG is working
on an update to RFC 2616 which removes the default charset of "ISO-8859-1"
for "text/*" media types. It is expected that the set of HTTPBIs documents
will reference this document in order to use the updated rules
of default charset in "text/*" media types.</cref>
</t>
<t>
Many complex text subtypes such as text/html <xref target="RFC2854"/> and text/xml <xref target="RFC3023"/> have internal
(to their format) means of describing the charset.
Many existing User Agents ignore the default of "US-ASCII" rule for at least
text/html and text/xml.
</t>
<t>This document changes RFC 2046 rules regarding default charset parameter
values for text/* media types to better align with common usage by existing
clients and servers.
</t>
</section>
<section title="Conventions Used in This Document">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in
<xref target="RFC2119"/>.</t>
</section>
<section title="New rules for default charset parameter values for text/* media types">
<t>Section 4.1.2 of <xref target="RFC2046"/> says:</t>
<t><list>
<t>The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.</t>
</list></t>
<!--///RFC 2046, Section 4.1.2 also says:
Note that the character set used, if anything other than US- ASCII,
must always be explicitly specified in the Content-Type field.
-->
<t>As explained in the Introduction section this rule is considered
to be outdated, so this document replaces it with the following set
of rules:</t>
<!--///Ned wrote:
In the absence of a specification of a default, I'm tempted
to say the "default default" should be UTF-8.
-->
<!--///John Klensin wrote:
Require text/* to be accompanied by a charset parameter
always. Of course, if one is omitted, which will happen
in practice, people will assume the old rules. But new
and updated specs say "parameter MUST be supplied".
-->
<t>Each subtype of the "text" media type which uses the "charset"
parameter can define its own default value for the "charset" parameter,
including absence of any default.
</t>
<t>
<!-- jre I'm not sure I understand that "for interoperability..." part-->
In order to improve interoperability with deployed agents,
"text/*" media type definitions SHOULD either
a) specify that the "charset" parameter is not used for the defined subtype,
because the charset information is transported inside the payload (as in "text/xml") or
b) require explicit unconditional inclusion of the "charset" parameter
eliminating the need for a default value.
<!--////Alexey: Hmmm, does this mean that the second choice above doesn't specify a default either?-->
In accordance with option (a), above, "text/*" media types that can
transport charset information inside the corresponding payloads,
specifically including "text/html" and "text/xml", SHOULD NOT specify
the use of a "charset" parameter, nor any default value, in order to
avoid conflicting interpretations should the charset parameter value
and the value specified in the payload disagree.</t>
<!--////Alexey: Julian also suggested that only charsets that are supersets of US-ASCII
should be used as defaults. I.e. this would rule out UTF-16. I tend to agree.-->
<t>
New subtypes of the "text" media type, thus, SHOULD NOT define a
default "charset" value. If there is a strong reason to do so
despite this advice, they SHOULD use the "UTF-8" <xref target="RFC3629"/> charset
as the default.
</t>
<t>
Specifications of how to specify the "charset" parameter, and what
default value, if any, is used, are subtype-specific, NOT protocol-
specific. Protocols that use MIME, therefore, MUST NOT override
default charset values for "text/*" media types to be different for
their specific protocol. The protocol definitions MUST leave that
to the subtype definitions.
</t>
</section>
<section title="Default charset parameter value for text/plain media type">
<t>The default charset parameter value for text/plain is unchanged
from <xref target="RFC2046"/> and remains as "US-ASCII".</t>
</section>
<!--////Do we also need to update the document that registers text/xml?-->
<section anchor="security" title="Security Considerations">
<t>TBD. Guessing of default charset is a security problem.
Conflicting information in-band vs out-of-band is also a security problem.
</t>
</section>
<section anchor="iana" title="IANA Considerations">
<t>
This document asks IANA to update the "text" subregistry of
the Media Types registry to additionally point to this document.
</t>
</section>
</middle>
<back>
<references title="Normative References">
<reference anchor="RFC2119">
<front>
<title abbrev="RFC Key Words">Key words for use in RFCs to Indicate Requirement Levels</title>
<author initials="S." surname="Bradner" fullname="Scott Bradner">
<organization>Harvard University</organization>
<address>
<postal>
<street>1350 Mass. Ave.</street>
<street>Cambridge</street>
<street>MA 02138</street></postal>
<phone>- +1 617 495 3864</phone>
<email>sob@harvard.edu</email></address></author>
<date year="1997" month="March"/>
<area>General</area>
<keyword>keyword</keyword>
<abstract>
<t>
In many standards track documents several words are used to signify
the requirements in the specification. These words are often
capitalized. This document defines these words as they should be
interpreted in IETF documents. Authors who follow these guidelines
should incorporate this phrase near the beginning of their document:
<list>
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
</t></list></t>
<t>
Note that the force of these words is modified by the requirement
level of the document in which they are used.
</t></abstract></front>
<seriesInfo name="BCP" value="14"/>
<seriesInfo name="RFC" value="2119"/>
<format type="TXT" octets="4723" target="http://www.rfc-editor.org/rfc/rfc2119.txt"/>
<format type="HTML" octets="17491" target="http://xml.resource.org/public/rfc/html/rfc2119.html"/>
<format type="XML" octets="5777" target="http://xml.resource.org/public/rfc/xml/rfc2119.xml"/>
</reference>
<reference anchor="RFC2046">
<front>
<title abbrev="Media Types">Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types</title>
<author initials="N." surname="Freed" fullname="Ned Freed">
<organization>Innosoft International, Inc.</organization>
<address>
<postal>
<street>1050 East Garvey Avenue South</street>
<city>West Covina</city>
<region>CA</region>
<code>91790</code>
<country>US</country></postal>
<phone>+1 818 919 3600</phone>
<facsimile>+1 818 919 3614</facsimile>
<email>ned@innosoft.com</email></address></author>
<author initials="N." surname="Borenstein" fullname="Nathaniel S. Borenstein">
<organization>First Virtual Holdings</organization>
<address>
<postal>
<street>25 Washington Avenue</street>
<city>Morristown</city>
<region>NJ</region>
<code>07960</code>
<country>US</country></postal>
<phone>+1 201 540 8967</phone>
<facsimile>+1 201 993 3032</facsimile>
<email>nsb@nsb.fv.com</email></address></author>
<date year="1996" month="November"/>
<abstract>
<t>STD 11, RFC 822 defines a message representation protocol specifying considerable detail about US-ASCII message headers, but which leaves the message content, or message body, as flat US-ASCII text. This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages to allow for</t>
<t>(1) textual message bodies in character sets other than US-ASCII,</t>
<t>(2) an extensible set of different formats for non-textual message bodies,</t>
<t>(3) multi-part message bodies, and</t>
<t>(4) textual header information in character sets other than US-ASCII.</t>
<t>These documents are based on earlier work documented in RFC 934, STD 11 and RFC 1049, but extends and revises them. Because RFC 822 said so little about message bodies, these documents are largely orthogonal to (rather than a revision of) RFC 822.</t>
<t>The initial document in this set, RFC 2045, specifies the various headers used to describe the structure of MIME messages. This second document defines the general structure of the MIME media typing sytem and defines an initial set of media types. The third document, RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text data in Internet mail header fields. The fourth document, RFC 2048, specifies various IANA registration procedures for MIME-related facilities. The fifth and final document, RFC 2049, describes MIME
conformance criteria as well as providing some illustrative examples of MIME message formats, acknowledgements, and the bibliography.</t>
<t>These documents are revisions of RFCs 1521 and 1522, which themselves were revisions of RFCs 1341 and 1342. An appendix in RFC 2049 describes differences and changes from previous versions.</t></abstract></front>
<seriesInfo name="RFC" value="2046"/>
<format type="TXT" octets="105854" target="http://www.rfc-editor.org/rfc/rfc2046.txt"/>
</reference>
<reference anchor="RFC3629">
<front>
<title>UTF-8, a transformation format of ISO 10646</title>
<author initials="F." surname="Yergeau" fullname="F. Yergeau">
<organization/></author>
<date year="2003" month="November"/>
<abstract>
<t>ISO/IEC 10646-1 defines a large character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems. The originally proposed encodings of the UCS, however, were not compatible with many current applications and protocols, and this has led to the development of UTF-8, the object of this memo. UTF-8 has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values. This memo obsoletes and replaces RFC 2279.</t></abstract></front>
<seriesInfo name="STD" value="63"/>
<seriesInfo name="RFC" value="3629"/>
<format type="TXT" octets="33856" target="http://www.rfc-editor.org/rfc/rfc3629.txt"/>
</reference>
</references>
<references title="Informative References">
<reference anchor="RFC2616">
<front>
<title abbrev="HTTP/1.1">Hypertext Transfer Protocol -- HTTP/1.1</title>
<author initials="R." surname="Fielding" fullname="Roy T. Fielding">
<organization abbrev="UC Irvine">Department of Information and Computer Science</organization>
<address>
<postal>
<street>University of California, Irvine</street>
<city>Irvine</city>
<region>CA</region>
<code>92697-3425</code></postal>
<facsimile>+1(949)824-1715</facsimile>
<email>fielding@ics.uci.edu</email></address></author>
<author initials="J." surname="Gettys" fullname="James Gettys">
<organization abbrev="Compaq/W3C">World Wide Web Consortium</organization>
<address>
<postal>
<street>MIT Laboratory for Computer Science, NE43-356</street>
<street>545 Technology Square</street>
<city>Cambridge</city>
<region>MA</region>
<code>02139</code></postal>
<facsimile>+1(617)258-8682</facsimile>
<email>jg@w3.org</email></address></author>
<author initials="J." surname="Mogul" fullname="Jeffrey C. Mogul">
<organization abbrev="Compaq">Compaq Computer Corporation</organization>
<address>
<postal>
<street>Western Research Laboratory</street>
<street>250 University Avenue</street>
<city>Palo Alto</city>
<region>CA</region>
<code>94305</code></postal>
<email>mogul@wrl.dec.com</email></address></author>
<author initials="H." surname="Frystyk" fullname="Henrik Frystyk Nielsen">
<organization abbrev="W3C/MIT">World Wide Web Consortium</organization>
<address>
<postal>
<street>MIT Laboratory for Computer Science, NE43-356</street>
<street>545 Technology Square</street>
<city>Cambridge</city>
<region>MA</region>
<code>02139</code></postal>
<facsimile>+1(617)258-8682</facsimile>
<email>frystyk@w3.org</email></address></author>
<author initials="L." surname="Masinter" fullname="Larry Masinter">
<organization abbrev="Xerox">Xerox Corporation</organization>
<address>
<postal>
<street>MIT Laboratory for Computer Science, NE43-356</street>
<street>3333 Coyote Hill Road</street>
<city>Palo Alto</city>
<region>CA</region>
<code>94034</code></postal>
<email>masinter@parc.xerox.com</email></address></author>
<author initials="P." surname="Leach" fullname="Paul J. Leach">
<organization abbrev="Microsoft">Microsoft Corporation</organization>
<address>
<postal>
<street>1 Microsoft Way</street>
<city>Redmond</city>
<region>WA</region>
<code>98052</code></postal>
<email>paulle@microsoft.com</email></address></author>
<author initials="T." surname="Berners-Lee" fullname="Tim Berners-Lee">
<organization abbrev="W3C/MIT">World Wide Web Consortium</organization>
<address>
<postal>
<street>MIT Laboratory for Computer Science, NE43-356</street>
<street>545 Technology Square</street>
<city>Cambridge</city>
<region>MA</region>
<code>02139</code></postal>
<facsimile>+1(617)258-8682</facsimile>
<email>timbl@w3.org</email></address></author>
<date year="1999" month="June"/>
<abstract>
<t>
The Hypertext Transfer Protocol (HTTP) is an application-level
protocol for distributed, collaborative, hypermedia information
systems. It is a generic, stateless, protocol which can be used for
many tasks beyond its use for hypertext, such as name servers and
distributed object management systems, through extension of its
request methods, error codes and headers . A feature of HTTP is
the typing and negotiation of data representation, allowing systems
to be built independently of the data being transferred.
</t>
<t>
HTTP has been in use by the World-Wide Web global information
initiative since 1990. This specification defines the protocol
referred to as "HTTP/1.1", and is an update to RFC 2068 .
</t></abstract></front>
<seriesInfo name="RFC" value="2616"/>
<format type="TXT" octets="422317" target="http://www.rfc-editor.org/rfc/rfc2616.txt"/>
<format type="PS" octets="5529857" target="http://www.rfc-editor.org/rfc/rfc2616.ps"/>
<format type="PDF" octets="550558" target="http://www.rfc-editor.org/rfc/rfc2616.pdf"/>
<format type="HTML" octets="636125" target="http://xml.resource.org/public/rfc/html/rfc2616.html"/>
<format type="XML" octets="493420" target="http://xml.resource.org/public/rfc/xml/rfc2616.xml"/>
</reference>
<reference anchor="RFC2854">
<front>
<title>The 'text/html' Media Type</title>
<author initials="D." surname="Connolly" fullname="D. Connolly">
<organization/></author>
<author initials="L." surname="Masinter" fullname="L. Masinter">
<organization/></author>
<date year="2000" month="June"/>
<abstract>
<t>This document summarizes the history of HTML development, and defines the "text/html" MIME type by pointing to the relevant W3C recommendations. This memo provides information for the Internet community.</t></abstract></front>
<seriesInfo name="RFC" value="2854"/>
<format type="TXT" octets="16135" target="http://www.rfc-editor.org/rfc/rfc2854.txt"/>
</reference>
<reference anchor="RFC3023">
<front>
<title>XML Media Types</title>
<author initials="M." surname="Murata" fullname="M. Murata">
<organization/></author>
<author initials="S." surname="St. Laurent" fullname="S. St. Laurent">
<organization/></author>
<author initials="D." surname="Kohn" fullname="D. Kohn">
<organization/></author>
<date year="2001" month="January"/>
<abstract>
<t>This document standardizes five new media types -- text/xml, application/xml, text/xml-external-parsed-entity, application/xml- external-parsed-entity, and application/xml-dtd -- for use in exchanging network entities that are related to the Extensible Markup Language (XML). This document also standardizes a convention (using the suffix '+xml') for naming media types outside of these five types when those media types represent XML MIME (Multipurpose Internet Mail Extensions) entities. [STANDARDS-TRACK]</t></abstract></front>
<seriesInfo name="RFC" value="3023"/>
<format type="TXT" octets="86011" target="http://www.rfc-editor.org/rfc/rfc3023.txt"/>
</reference>
</references>
<section title="Acknowledgements">
<t>
Many thanks to Ned Freed and John Klensin for comments and ideas that motivated
creation of this document, and to Barry Leiba for suggested text.
</t>
</section>
</back>
</rfc>| PAFTECH AB 2003-2026 | 2026-04-24 03:19:23 |