One document matched: draft-ietf-eai-rfc5335bis-10.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

<!ENTITY % rfc1652 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.1652.xml'>
<!ENTITY % I-D.ietf-eai-5378bis PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-eai
-5378bis-00.xml'>
<!ENTITY % I-D.ietf-eai-rfc5721bis PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-eai
-rfc5721bis-00.xml'>
<!ENTITY % rfc2119 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
<!ENTITY % rfc2045 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2045.xml'>
<!ENTITY % rfc2046 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2046.xml'>
<!ENTITY % rfc2047 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2047.xml'>
<!ENTITY % rfc2231 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2231.xml'>
<!ENTITY % rfc5321 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5321.xml'>
<!ENTITY % rfc5322 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5322.xml'>
 <!ENTITY % rfc5598 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5598.xml'>
<!ENTITY % rfc3629 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml'>
<!ENTITY % rfc4952 PUBLIC ''
   'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4952.xml'>

<!ENTITY % rfc5198 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5198.xml'>
<!ENTITY % rfc5234 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5234.xml'>

<!-- PKIX -->
<!ENTITY % rfc5280 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5280.xml'>
<!-- openPGP -->
<!ENTITY % rfc3156 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3156.xml'>
<!ENTITY % rfc5863 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5863.xml'>
<!ENTITY % rfc5335 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5335.xml'>
<!ENTITY % rfc5504 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5504.xml'>
<!ENTITY % I-D.ietf-eai-rfc5336bis PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-eai
-rfc5336bis-07.xml'>



<!ENTITY % I-D.ietf-eai-frmwrk-4952bis PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-eai
-frmwrk-4952bis-10.xml'>
]>

<?rfc rfcedstyle='yes' subcompact='no'  toc='yes' symrefs="yes"
sortrefs="yes" ?>

<rfc  ipr='trust200902' category="std" updates="2045,5321,5322"
	obsoletes="5335" docName="draft-ietf-eai-rfc5335bis-09">

<workgroup>Eail Address Internationalization (EAI)</workgroup>

<front>
<title abbrev="I18N Email Headers">
Internationalized Email Headers
</title>


<author initials="Y" surname="Abel" fullname="Abel Yang">

	<organization>TWNIC</organization>
	<address>
		<postal>

			<street>4F-2, No. 9, Sec 2, Roosevelt Rd.</street>
			<city>Taipei</city>
			<region/>
			<code>100</code>
			<country>Taiwan</country>
		</postal>
		<phone>+886 2 23411313 ext 505 </phone>

		<email>abelyang@twnic.net.tw</email>
	</address>
</author>

<author initials="S.S" surname="Steele" fullname="Shawn Steele">
  <organization>Microsoft</organization>
- <address>
  <email>Shawn.Steele@microsoft.com</email>
  </address>
</author>

<date year="2011" />

<!-- [rfced] Please insert any keywords (beyond those that appear in
the title) for use on http://www.rfc-editor.org/rfcsearch.html. -->
<area>Applications</area>
<workgroup>Email Address Internationalization (EAI)</workgroup>

<keyword>I18N Email Header</keyword>
<keyword>UTF-8 Email Header</keyword>
<keyword>internationalization Email Header</keyword>

<abstract>
<t>
	Internet mail was originally limited to 7-bit ASCII. Recent enhancements
	support Unicode's UTF-8 encoding in portions of a message. Full
	internationalization of electronic mail requires additional enhancement,
	including support for UTF-8 in user-oriented header fields, such as in
	the To, From, and Subject fields. This document specifies an enhancement
	to Internet mail that permits native UTF-8 support in the header and body
	of a message.
</t>
</abstract>
</front>

<middle>
<section title="Introduction" anchor="INT">

<t>
    Internet mail distinguishes a message from its transport and further
divides a
	message between a header and a body <xref target="RFC5598" />. Internet
mail
	header fields contain a variety of strings that are intended to be
user-visible.
	The range of supported characters for these strings was originally limited
to
	a subset of <xref target="ASCII" />; globalization of the Internet
requires
	support of the much larger set contained in UTF-8 <xref
target="RFC5198"/>.
	Complex encoding alternatives to UTF-8, as an overlay to the existing
ASCII
	base, would introduce inefficiencies as well as opportunities for
processing
	errors. Native support for UTF-8 encoding <xref target="RFC3629"/>. is
widely
	available among systems now used over the Internet. Hence supporting this
	encoding directly within email is desired. This document specifies an
enhancement
	to Internet mail that permits the use of UTF-8 encoding, rather than only
ASCII,
	as the 	base form for header fields.
</t>
<t>
    This specification is based on a model of native, end-to-end support for
UTF-8,
	which uses an "8-bit clean" environment . Support for carriage across
legacy,
	7-bit infrastructure and for processing by 7-bit receivers requires
additional
	mechanisms that are not provided by this specification.

</t>
<section title="Brief Overview of Changes" anchor="BOC">
<t>
	This document updates email header [RFC5322] and MIME header [RFC2045].
	Email header value extends beyond ASCII, allowing UTF-8 encoding. MIME
	header lifts the prohibition against using content-transfer-encoding on
	all subtypes under composite-type "message/". Appendix A documents the
	derived ABNF rules that inherit support UTF-8, due to the update of ABNF
	introduces from this document.
</t>
<t>
	[Editor notes will be removed before Last Call]
</t>
<t>
	[Editor Note 1: This revision (-08 and up) changed the ABNF approach
compared to previous revision (-07 and before). ]
</t>
<t>
	[Editor Note 2: pending -- ABF to support IDN]
</t>
<t>
	[Editor Note 3: Discuss with WG whether some fields, like Trace header,
have issues if UTF-8 encoding allowed in values]
</t>
</section>
</section>


<section title="Relation to Other Standards" anchor="relations">
<t>
	This document updates Section 6.4 of <xref target="RFC2045" />.
	It removes the blanket ban on applying a content-transfer-encoding
	to all subtypes of message/ and instead specifies that a composite
	subtype MAY specify whether or not a content-transfer-encoding can
	be used for that subtype, with "cannot be used" as the default.
</t>
<t>
This document also updates Section 3.4 of <xref target="RFC5322" />.
It Extended mailbox address syntax to permit UTF-8 character in <xref
target="syntax_change" />.
</t>
<t>
Allowing use of a content-transfer-encoding on subtypes of messages
is not limited to transmissions that are authorized by the SMTP
extension specified in <xref target="I-D.ietf-eai-rfc5336bis"/>.
message/global (see <xref target="utf8smtp" />) of this document
permits use of a content-transfer-encoding.
</t>


</section>

<section title="Background and History">
<t>Mailbox names often represent the names of human users. Many of these
users throughout the world have names that are not normally expressed
with just the ASCII repertoire of characters, and would like to use more
or less their real names in their mailbox names. These users
are also likely to use non-ASCII text in their display<!-- common --> names
and subjects
of email messages, both received and sent.
This protocol specifies UTF-8 as the encoding to represent email header
field bodies.</t>

<t>The traditional format of email messages

<xref target="RFC5322"/> allows only ASCII characters in the
header fields of messages. This prevents users from having email addresses
that contain non-ASCII characters. It further forces non-ASCII text in
display <!-- common--> names, comments, and in free text (such as in the
"Subject:" field)
to be encoded (as required by MIME format <xref target="RFC2047" />). This
specification describes a change to the email message format that is related
to the SMTP message transport change described in the  associated documents
<xref target="I-D.ietf-eai-frmwrk-4952bis"/> and <xref
target="I-D.ietf-eai-rfc5336bis"/>, and that
allows non-ASCII characters in most email header fields. These changes
affect
SMTP clients, SMTP servers, mail user agents (MUAs), list expanders,
gateways to other media, and all other processes that parse or handle
email messages.</t>

<t>As specified in <xref target="I-D.ietf-eai-rfc5336bis"/>,
an SMTP protocol extension "UTF8SMTP" is used to prevent the
transmission of messages with UTF-8 header fields to systems that
cannot handle such messages.
<!--
[[Note in Draft: Keyword related to UTF8SMTP  will be decided by WG before
publication.]]
-->

</t>

<t>
Use of this SMTP extension helps prevent the introduction of such
messages into message stores that might misinterpret, improperly display,
or mangle such messages. It should be noted that using an ESMTP extension
does not prevent transferring email messages with UTF-8 header fields to
other systems that use the email format for messages and that may not be
upgraded,  such as unextended POP and IMAP servers. Changes to these
protocols to handle UTF-8 header fields are addressed in
<xref target="I-D.ietf-eai-rfc5721bis" /> and
<xref target="I-D.ietf-eai-5378bis" />.


</t>

<t>The objective for this protocol is to allow UTF-8 in email header fields.
<!--
    Issues such as how to handle messages containing UTF-8 header fields
that
    have to be delivered to systems that have not been upgraded to support
    this capability are discussed in <xref target="RFC5504"/>.
	-->
</t>

</section>

<section title="Terminology">
<t>
A plain ASCII string is fully compatible with <xref target="RFC5321" /> and
<xref target="RFC5322" />.
In this document, non-ASCII strings are UTF-8 strings if they are in header
which contain at least one <UTF8-non-ascii>.

<!-- Original paragraph
A plain ASCII string is also a valid UTF-8 string; see <xref
target="RFC3629"/>.
In this document, ordinary ASCII characters are UTF-8 characters
if they are in headers which contain <UTF8-non-ascii>s.</t>
-->
</t>
<t>
Unless otherwise noted, all terms used here are defined in
<xref target="RFC5321"/>, <xref target="RFC5322"/>,
<xref target="I-D.ietf-eai-frmwrk-4952bis" />, or <xref
target="I-D.ietf-eai-rfc5336bis"/>.
</t>
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref target="RFC2119" />.
</t>
</section>


<section title="Changes on Message Header Fields">
<t>SMTP clients can send header fields in UTF-8 format, if the
UTF8SMTP extension is advertised by the SMTP server or is permitted by other
transport mechanisms. </t>

<t>

This protocol does NOT change the <xref target="RFC5322"/> rules for
defining header field
names. The bodies of header fields are allowed to contain UTF-8 characters,
but the header field names themselves must contain only ASCII characters.
</t>
<t>
To permit UTF-8 characters in field values, the header definition in
<xref target="RFC5322"/> is extended to support the new format.
The following ABNF is defined to substitute those definitions in <xref
target="RFC5322"/>.
</t>
<t>
	The syntax rules not covered in this section remain as defined in <xref
target="RFC5322"/>.
</t>

<section title="UTF-8 Syntax and Normalization" anchor="Normalization">
<t>UTF-8 characters can be defined in terms of octets using the following
ABNF <xref target="RFC5234"/>, taken from <xref target="RFC3629"/>:
<figure>
  <artwork><![CDATA[

UTF8-non-ascii  =   UTF8-2 / UTF8-3 / UTF8-4

UTF8-2          =   <Defined in Section 4 of RFC3629>

UTF8-3          =   <Defined in Section 4 of RFC3629>

UTF8-4          =   <Defined in Section 4 of RFC3629>
  ]]></artwork>
</figure>

</t>
<t>
See <xref target="RFC5198"/> for a discussion of normalization; the use of
normalization form
<xref target="NFC" /> is RECOMMENDED. Actually, if one is going to do
internationalization properly, one of
the most often-cited goals is to permit people to spell their names
correctly. Since many
mailbox local parts reflect personal names, that principle applies as well.
And NFKC is
not recommended because it may lose information that is needed to correctly
spell some names <!-- except -->in unusual circumstances.
</t>
</section>

<section title="Changes on MIME Headers" anchor="change_on_mime_header">
<t>
   This specification updates Section 6.4 of <xref target="RFC2045"/>.
   <xref target="RFC2045"/> prohibits applying a content-transfer-encoding
   to any subtypes of "message/".  This specification relaxes the rule -- it
allows
   newly defined MIME types to permit content-transfer-encoding, and it
allows
   content-transfer-encoding for message/global (see <xref
target="utf8smtp"/>).
</t>



<t>
   Background: Normally, transfer of message/global will be done in
   8-bit-clean channels, and body parts will have "identity" encodings,
   that is, no decoding is necessary.

   In the case where a message containing a message/global is downgraded
   from 8-bit to 7-bit as described in <xref target="RFC1652"/>, an encoding
   may be applied to the message; if the message travels multiple times
   between a 7-bit environment and an environment implementing UTF8SMTP,
   multiple levels of encoding may occur. This is expected to be rarely
   seen in practice, and the potential complexity of other ways of dealing
   with the issue are thought to be larger than the complexity of allowing
   nested encodings where necessary.
</t>

</section>
<section title="Syntax Extensions to RFC 5322" anchor="syntax_change">
<t>The following rules are intended to extend the corresponding rules in
<xref target="RFC5322"/> in order to allow UTF-8 characters.
<figure>
  <artwork><![CDATA[
FWS     =  <Defined in Section 3.2.2 of RFC 5322>

CFWS    =  <Defined in Section 3.2.2 of RFC 5322>

ctext   =/  UTF8-non-ascii
            ; Extending ctext in RFC 5322, Section 3.2.2
comment =   "(" *([FWS] uCcontent) [FWS] ")"

word    =   uAtom / uQuoted-String

  ]]></artwork>
</figure>
This means that all the <xref target="RFC5322"/> constructs that build upon
these will
permit UTF-8 characters, including comments and quoted strings. We do not
change the syntax
of <atext> in order to allow UTF-8 characters in <addr-spec>.
This would
also allow UTF-8 characters in <message-id>, which is not allowed due
to the limitation
described in <xref target="TraceFieldLimit" />. Instead, <uAtext> is
added to meet this requirement.

<!--
uQuoted-Pair   = ("\" uText) / obs-qp
-->
<figure>
  <artwork><![CDATA[
uText          = %d1-9 /    ; all UTF-8 characters except
                 %d11-12 /  ; US-ASCII NUL, CR, and LF
                 %d14-127 /
                 UTF8-non-ascii

uQuoted-Pair   = ("\" (VCHAR / WSP / UTF8-non-ascii )) / obs-qp

VCHAR          = <Defined in appendix B.1 of RFC 5234>

WSP            = <Defined in appendix B.1 of RFC 5234>

obs-qp         = <Defined in Section 4.1 of RFC 5322>

uQcontent      = uQtext / uQuoted-Pair

DQUOTE         = <Defined in appendix B.1 of RFC 5234>

uCcontent      = ctext / uQuoted-Pair / comment

uQtext         = qtext / UTF8-non-ascii

uAtext         = ALPHA / DIGIT /
                 "!" / "#" /  ; Any character except
                 "$" / "%" /  ; controls, SP, and specials.
                 "&" / "'" /  ; Used for atoms.
                 "*" / "+" /
                 "-" / "/" /
                 "=" / "?" /
                 "^" / "_" /
                 "`" / "{" /
                 "|" / "}" /
                 "~" /
                 UTF8-non-ascii

uAtom          = [CFWS] 1*uAtext [CFWS]

uDot-Atom      = [CFWS] uDot-Atom-text [CFWS]

uDot-Atom-text = 1*uAtext *("." 1*uAtext)

  ]]></artwork>
</figure>

</t>
<t>
To allow the use of UTF-8 in a Content-Description header field
<xref target="RFC2045" />, the following syntax is used:
<!--
description    = "Content-Description:" unstructured CRLF
-->
<figure>
  <artwork><![CDATA[

description    = "Content-Description" ":" *uText
                ; Replace description in RFC 2045, Section 8

  ]]></artwork>
</figure>
<!--
The <uText>  syntax is extended above to allow UTF-8 in all
<unstructured> header fields.</t>
-->
The <uText>  syntax is extended above to allow UTF-8 in all
<description> header fields.</t>
<t>
Note, however, this does not remove any constraint on the character set of
protocol elements;
for instance, all the allowed values for timezone in the "Date:" header
fields are still expressed in ASCII.
And also, none of this revised syntax changes what is allowed in a
<msg-id>, which will still remain in pure ASCII.
</t>



</section>
<section title="Change on addr-spec Syntax" anchor="change_addr_spec">
<t>Internationalized email addresses are represented in UTF-8. Thus,
all header fields containing <mailbox>es are updated from
<xref target="RFC5321" /> Section 4.1.2 to permit UTF-8 addresses.
<!--
as well as an additional, optional all-ASCII alternate
address. Note that Message Submission Servers ("MSAs") and Message
Transfer Agents (MTAs) may downgrade internationalized messages as needed.
The procedure for doing so is described in <xref target="RFC5504"/>.
-->

<!--
The procedure for doing so is described in <xref target="RFC5504" />.
-->

<figure>
  <artwork><![CDATA[
mailbox	       = name-addr / addr-spec / uAddr-Spec
                 ; Replace mailbox in RFC 5322, Section 3.4

angle-addr     =/ [CFWS] "<" uAddr-Spec ">" [CFWS]
                 ; Extending angle-addr in RFC 5322, Section 3.4

uAddr-Spec     = uLocal-Part "@" uDomain

uLocal-Part    = uDot-Atom / uQuoted-String / obs-local-part
                 ; Replace Local-Part in RFC 5322, Section 3.4.1

uQuoted-String = [CFWS] DQUOTE *([FWS] uQcontent) [FWS] DQUOTE [CFWS]

obs-local-part = <Defined in Section 4.4 of RFC 5322>
uDomain        = uDot-Atom / domain-literal / obs-domain

domain-literal = <Defined in Section 3.4.1 of RFC 5322>

  ]]></artwork>
</figure>

Below are a few examples of possible <mailbox> representations.
<figure>
<artwork>
<![CDATA[

   "DISPLAY_NAME" <ASCII@ASCII>
   ; traditional mailbox format

   "DISPLAY_NAME" <non-ASCII@non-ASCII>
   ; message will be rejected if UTF8SMTP extension is not supported

   <non-ASCII@non-ASCII>
   ; without DISPLAY_NAME and quoted string
   ; message will be rejected if UTF8SMTP extension is not supported

  ]]></artwork>
</figure>
</t>
</section>

<section title="Trace Field Syntax" anchor="TraceFieldLimit">

<t>
    The 'uFor' clause in "Received:" fields has been allowed the use of
internationalized addresses in "For" fields.
	It is described in <xref target="I-D.ietf-eai-rfc5336bis" />, Section
3.6.3.
</t>
<t>
    The "Return-path" designates the address to which messages indicating
    non-delivery or other mail system failures are to be sent.
	Thus, the header is augmented to carry UTF-8 addresses
	(see the revised syntax of <angle-addr> in <xref
target="change_addr_spec"/> of this document).
	This will not break the rule of trace field integrity, because
	the header field is added at the last MTA and described in <xref
target="RFC5321"/>.
</t>

<t>

	The <received-token> on "Received:" field ( described in Section
3.6.7 of <xref target="RFC5322" />)
	syntax is augmented to allow UTF-8 email address in the "For" field.
<angle-addr> is augmented
	to include UTF-8 email address. In order to allow UTF-8	email addresses
in an
	<addr-spec>, <uAddr-Spec> is added to <received-token>.

<figure>
<artwork>
<![CDATA[
received-token =/ uAddr-Spec
  ]]></artwork>
</figure>
</t>
</section>

<section title="message/global" anchor="utf8smtp">
<t>
   Internationalized messages MUST only be transmitted as authorized by
<xref target="I-D.ietf-eai-rfc5336bis" />
   or within a non-SMTP environment that supports these messages.
   A message is a "message/global message", if

   <list style="symbols">
		<t>it contains UTF-8 header values as specified	in this document,
or</t>
		<t>it contains UTF-8 values in the headers fields of body parts. </t>

   </list>

</t>
<t>
   The type message/global is similar to message/rfc822, except that it
   specifies that a message can contain UTF-8 characters in the headers
   of the message or body parts.

<!-- SM's comments to be removed
 The type message/global is similar to message/rfc822, except that it
contains a message that can
   contain UTF-8 characters in the headers of the message or body parts.
-->
   If this type is sent to a 7-bit-only system, it
   has to be encoded in MIME <xref target="RFC2045" />.
   (Note that a system compliant with MIME that doesn't recognize
   message/global SHOULD treat it as "application/octet-stream" as
   described in Section 5.2.4 of <xref target="RFC2046" />.)



<!-- SM's comments
   (Note that a system compliant with
   MIME that doesn't recognize message/global would treat it as
   "application/octet-stream" as described in Section 5.2.4 of
  <xref target="RFC2046" />.)
-->
</t>
<!--
<t>
Alternatively, SMTP servers and other systems which transfer a
message/global body part MAY choose to down-convert it to a
message/rfc822 body part using the rules described in <xref
target="RFC5504"/>.
</t>
-->


<t><list style="hanging">
<t hangText="Type name:">message</t>
<t hangText="Subtype name:">global</t>
<t hangText="Required parameters:">none</t>
<t hangText="Optional parameters:">none</t>
<t hangText="Encoding considerations:">Any content-transfer-encoding is
permitted.
The 8-bit or binary content-transfer-encodings are recommended where
permitted.</t>
<t hangText="Security considerations:">See <xref target="security" />.</t>
<t hangText="Interoperability considerations:">The media type provides
functionality similar to the message/rfc822 content type for email
messages with international email headers.  When there is a need to
embed or return such content in another message, there is generally an
option to use this media type and leave the content unchanged or
down-convert the content to message/rfc822.  Both of these choices will
interoperate with the installed base, but with different properties.
Systems unaware of internationalized headers will typically treat a
message/global body part as an unknown attachment, while they will
understand the structure of a message/rfc822.  However, systems that
understand message/global will provide functionality superior to the
result of a down-conversion to message/rfc822.  The most interoperable
choice depends on the deployed software.</t>

<t hangText="Published specification:">RFC &rfc.number;</t>
<t hangText="Applications that use this media type:">SMTP servers and
email clients that support multipart/report generation or parsing.
Email clients that forward messages with international headers as
attachments.</t>
<t hangText="Additional information:"></t>
<t hangText="Magic number(s):">none</t>
<t hangText="File extension(s):">The extension ".u8msg" is suggested.</t>
<t hangText="Macintosh file type code(s):">A uniform type identifier
(UTI) of "public.utf8-email-message" is suggested.  This conforms to
"public.message" and "public.composite-content", but does not necessarily
conform to "public.utf8-plain-text".</t>
<t hangText="Person & email address to contact for further
information:">See the Author's Address section of this document.</t>
<t hangText="Intended usage:">COMMON</t>
<t hangText="Restrictions on usage:">This is a structured media type
that embeds other MIME media types.  The 8-bit or binary
content-transfer-encoding SHOULD be used unless this media type is sent
over a 7-bit-only transport.</t>

<t hangText="Author:">See the Author's Address section of this document.</t>
<t hangText="Change controller:">IETF Standards Process</t>
</list></t>
</section>
</section>

<section title="Security Considerations" anchor="security">
<t>
	If a user has a non-ASCII mailbox address and an ASCII mailbox
	address, a digital certificate that identifies that user may have
	both addresses in the identity. Having multiple email addresses as
	identities in a single certificate is already supported in
	PKIX (Public Key Infrastructure for X.509 Certificates) <xref
target="RFC5280" />
	and OpenPGP <xref target="RFC3156" />.
</t>
<t>
	Because UTF-8 often requires several octets to encode a single character,
	internationalized local parts and header value may cause mail addresses to
	become longer. As specified in <xref target="RFC5322"/>, each line of
	characters MUST be no more than 998 octets, excluding the CRLF. On the
other
	hand, MDA (Mail Delivery Agent) processes that parse, store, or handle
email
	addresses or local parts must take extra care not to overflow buffers,
	truncate addresses,	or exceed storage allotments. Also, they must take
care,
	when comparing, to use the entire lengths of the addresses.
</t>
<t>
	There are lots of ways of using UTF-8 to represent something equivalent
	or similar to a particular displayed character or group of characters,
	then filtering systems can be bypassed by using one of the variants to
	avoid detection while still reaching the end user with largely the same
	original effect. This normalization is described in <xref
target="Normalization" />.

</t>

<t>

The security impact of UTF-8 headers on email signature systems such as
Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is discussed in
<xref target="I-D.ietf-eai-frmwrk-4952bis"/>, Section 14.
</t>

</section>


<section title="IANA Considerations" anchor="IANA">
<t>
IANA is requested to update the registration of the message/global MIME type
using the
  registration form contained in <xref target="utf8smtp" />.
</t>

<!--IANA has registered the message/global MIME type
using the registration form contained in <xref target="change_addr_spec" />.
-->


</section>

<section title="Acknowledgements">

<t>
This document incorporates many ideas first described in
Internet-Draft form by Paul Hoffman, although many details have
changed from that earlier work.
</t>

<t>
The author especially thanks Jeff Yeh for his efforts and contributions
on editing previous versions.
</t>
<t>
Most of the content of this document is provided by John C Klensin.
Also, some significant comments and suggestions were received from Charles
H. Lindsey,
Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris Newman, Yangwoo Ko,
Yoshiro Yoneya,
and other members of the JET team (Joint Engineering Team) and were
incorporated into the document.
The editor sincerely thanks them for their contributions.
</t>
</section>
<section title="Edit history">
<!--
<t>This section is used for tracking the update of this document. Will be
removed after finalize.</t>
-->
<t>
[[RFC Editor: please remove this section before publishing.]]
</t>
<section title="draft-ietf-eai-rfc5335bis-00">
<t>
<list style="numbers">
<t>Applied Errata suggested by Alfred Hoenes.</t>
<t>Adjust [RFC2821] and [RFC2822] to <xref target="RFC5321"/> and <xref
target="RFC5322"/>.</t>
<t>Abrogate <alt-address> in ABNF of <angle-addr>.</t>
<t>Revoke [RFC5504] from this document.</t>
<t>Upgrade some references from I-Ds to RFC.</t>
</list>
</t>
</section>
<section title="draft-ietf-eai-rfc5335bis-01">
<t>
<list style="numbers">
<t>Author name revised.</t>
</list>
</t>
</section>

<section title="draft-ietf-eai-rfc5335bis-02">
<t>
<list style="numbers">
<t>ABNF revised.</t>
</list>
</t>
</section>


<section title="draft-ietf-eai-rfc5335bis-03">
<t>
<list style="numbers">
<t>Fix typos </t>
<t>ABNF revised </t>
<t>Improve sentence</t>

</list>
</t>
</section>

<section title="draft-ietf-eai-rfc5335bis-04">
<t>
<list style="numbers">
<t>improve sentences and ABNF revised based on AD and Co-chairs </t>
</list>
</t>
</section>

<section title="draft-ietf-eai-rfc5335bis-05">
<t>
<list style="numbers">
<t>ABNF revised in <xref target="change_addr_spec" /> based on AD comments
</t>
</list>
</t>
</section>


<section title="draft-ietf-eai-rfc5335bis-06">
<t>
<list style="numbers">
<t>ABNF revised </t>
<t>improve <xref target="IANA" /> </t>
</list>
</t>
</section>

<section title="draft-ietf-eai-rfc5335bis-07">
<t>
<list style="numbers">
<t>Minor ABNF revised in <xref target="syntax_change" /></t>
<t>improve <xref target="IANA" /> </t>
</list>
</t>
</section>

<section title="draft-ietf-eai-rfc5335bis-09">
<t>Version -08 was posted in error and withdrawn.  Version 09 is
   is identical to version 07 except for a date change, addition of
   this note, and some vertical spacing compression on this page.
</t>
</section>
<section title="draft-ietf-eai-rfc5335bis-10">
<t>
<list style="numbers">
<t>Add <xref target="A1" /> and <xref target="BOC" /></t>
<t>Replace polls result in Abstract and <xref target="INT"/></t>
<t>Minor Sentence modification</t>
</list>
</t>
</section>
</section>

</middle>

<back>
<references title="Normative References">
<!-- SMTP Service Extension for 8bit-MIMEtransport  -->

<!-- ietf-eai-smtpext became RFC 5336 -->
<reference anchor="ASCII">
	<front>
		<title>Coded Character Set -- 7-bit American Standard Code for
Information Interchange</title>
		<author fullname="American National Standards Institute"/>
		<date year="1986"/>
	</front>
	<seriesInfo name="ANSI" value="X3.4"/>
</reference>


&rfc2119;

&rfc5321;
&rfc5322;
<!-- UTF-8 -->
&rfc3629;
&I-D.ietf-eai-frmwrk-4952bis;
&I-D.ietf-eai-rfc5336bis;
&I-D.ietf-eai-5378bis;
&I-D.ietf-eai-rfc5721bis;
&rfc5198;
&rfc5234;
&rfc5598;
<reference anchor="NFC" target="http://www.unicode.org/reports/tr15/">
  <front>
    <title>Unicode Standard Annex #15: Unicode Normalization Forms</title>
    <author initials="M." surname="Davis" fullname="Mark Davis" />
	<author initials="K." surname="Whistler" fullname="Ken Whistler" />
    <date day='17' month='September' year='2010' />
  </front>
</reference>



</references>

<references title="Informative References">
<!-- 8bitmime -->
&rfc1652;
<!-- MIME (parts 1, 2, 3) -->
&rfc2045;
&rfc2046;
&rfc2047;
<!-- openPGP -->
&rfc3156;
<!-- PKIX -->
&rfc5280;


<!--
&rfc5504;  given up
-->

</references>
      <section title="Changes to support UTF-8" anchor="A1">
         <t>This section provides a basic audit of the places in a message
            that now can permit UTF-8 rather than being restricted to ASCII,
            based on the changes to underlying ABNF. The audit ignores rule
            for "obsolete" constructs in <xref target="RFC5322"/>.
			(This is a first cut and the list is likely incomplete):
			<list style="hanging">
               <t hangText="VCHAR:  ">quoted-pair, unstructured</t>
               <t>> ccontent, qcontent </t>
               <t>> comment, quoted-string </t>
               <t>> word, local-part </t>
               <t>> phrase </t>
               <t>> display-name, keywords </t>

               <t
                  hangText="ctext:  ">ccontent > comment</t>

               <t
                  hangText="atext:  ">atom, dot-atom-text</t>

               <t
                  hangText="qtext:  ">qcontent > quoted-string</t>

            </list>
         </t>
      </section>
</back>
</rfc>

PAFTECH AB 2003-20262026-04-23 14:29:23