One document matched: draft-hoffman-utf8-rfcs-00.txt
Network Working Group P. Hoffman
Internet-Draft VPN Consortium
Expires: December 28, 2005 June 26, 2005
Use of UTF-8 in RFCs
draft-hoffman-utf8-rfcs-00.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 28, 2005.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This document specifies a change to the IETF process in which RFCs
are encoded as UTF-8 instead of US-ASCII.
1. Introduction
Various guideline documents in the IETF, notably [RFC2223], specify
that RFCs must use only the US-ASCII character set. This restriction
has historically caused many problems, notably:
Hoffman Expires December 28, 2005 [Page 1]
Internet-Draft Use of UTF-8 in RFCs June 2005
o Names and addresses of authors of IETF documents are misspelled
o Names and document titles in references are misspelled
o Protocol examples that show non-ASCII characters cannot be shown
directly
The first two issues cause real problems for people searching for
RFCs for particular authors or references that contain non-ASCII
characters. For many languages that use Latin characters outside the
ASCII range, there are not absolute mappings between those non-ASCII
characters and ASCII equivalents. A common example is that "u-with-
umlaut" may be mapped to "u" or to "ue"; many other mapping
difficulties exist.
Now that UTF-8 [RFC3629] is nearly universally available in text-
editing and display systems, the IETF can eliminate these problems by
changing RFCs to use UTF-8.
2. Use of UTF-8 in RFCs
Upon publication of this document as an RFC, all RFCs will be
considered to be encoded in UTF-8. The the RFC Editor needs to
change their processes to publish documents that are valid UTF-8.
Note that the change described in this document only applies to RFCs.
Internet Drafts retain their restriction to US-ASCII. This means
that, during the RFC preparation phase, document authors can ask the
RFC Editor to change the spelling of some parts of the Internet Draft
from which the RFC Editor is preparing the final RFC.
It is suggested that the RFC Editor limit non-ASCII characters to the
following:
o Names and addresses of authors, used at the top of RFCs and in the
author contact section
o Names and document titles used in the References sections
o Quotations from non-English languages
o Protocol examples that show non-ASCII characters, such as when
showing internationalized domain names (IDNs) and
internationalized resource identifiers (IRIs)
The RFC Editor should determine in an expedient manner which
characters are acceptable in RFCs. For example, the RFC Editor might
exclude some control characters because they could affect automatic
Hoffman Expires December 28, 2005 [Page 2]
Internet-Draft Use of UTF-8 in RFCs June 2005
processing of RFCs, but they might also allow them. The RFC Editor
should publish one or more RFCs with a variety of non-ASCII
characters to help determine which characters, if any, will be
problematic for processing.
3. Security considerations
A display program that expects only US-ASCII input may fail when it
encounters octets outside the US-ASCII range of values. Such a
failure may become a security issue. For example, the program may
display incorrect results for the input. More seriously, the program
may have an internal error that causes it to fail in a security-
compromising fashion.
4. IANA considerations
This document does not change or create any IANA-registered values.
5. Informative References
[RFC2223] Postel, J. and J. Reynolds, "Instructions to RFC Authors",
RFC 2223, October 1997.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, November 2003.
Author's Address
Paul Hoffman
VPN Consortium
127 Segre Place
Santa Cruz, CA 95060
USA
Email: paul.hoffman@vpnc.org
Appendix A. Arguments against changing to UTF-8
Over the past decade, the question of changing the encoding of RFCs
to UTF-8 has come up repeatedly. Although many people wanted the
change, various people had different reasons why they felt it was a
bad idea. This appendix is a summary of those arguments and an
explanation of why they are no longer as critical.
A.1 Difficulty in displaying
Some text display systems only know how to display US-ASCII.
Hoffman Expires December 28, 2005 [Page 3]
Internet-Draft Use of UTF-8 in RFCs June 2005
Displaying an RFC that uses non-ASCII characters encoded in UTF-8
will cause those characters to be unreadable.
There are, of course, still such display systems, and there always
will be. However, the number is dwindling as more software is
improved to display non-ASCII characters and, in particular, to read
UTF-8 as an encoding. Of the systems that can only render US-ASCII,
only a small subset drop non-ASCII characters: the others show an
incorrect character in its place. Thus, the person using such a
system can often see that there is a problem, and can possibly choose
to get better display software.
A.2 Insufficient fonts
Almost no display system that can display text that is encoded with
UTF-8 can display every character in the Unicode repertoire. Thus,
some non-ASCII characters that are included in RFCs will not display
properly.
Virtually every system that can display Unicode knows how to
substitute a replacement character for ones that cannot be displayed.
In fact, most such systems have glyphs for rendering unknown
characters and different glyphs for rendering known characters for
which the system has no font.
A.3 Normalization
Due to the way that Unicode uses combining characters, there are
sometimes multiple ways to spell the same character. For example,
the character "lowercase-a-with-accent" can be spelled in two ways:
as a single character (U+00E1) or as two characters (U+0061 followed
by U+0301). Thus, searching for this character can be ambiguous.
Although there are multiple ways to spell some characters, almost all
such characters have a shortest form that can be found using the
Unicode normalization rules. The RFC Editor should use only
normalized strings in RFCs.
Hoffman Expires December 28, 2005 [Page 4]
Internet-Draft Use of UTF-8 in RFCs June 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Hoffman Expires December 28, 2005 [Page 5]
| PAFTECH AB 2003-2026 | 2026-04-24 13:55:06 |