One document matched: draft-hoffman-utf8-rfcs-01.txt

Differences from draft-hoffman-utf8-rfcs-00.txt





Network Working Group                                         P. Hoffman
Internet-Draft                                            VPN Consortium
Expires: June 4, 2006                                   December 1, 2005


                   Using non-ASCII Characters in RFCs
                     draft-hoffman-utf8-rfcs-01.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on June 4, 2006.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This document specifies a change to the IETF process in which RFCs
   are allowed to have non-ASCII characters.  The proposed change is to
   change the encoding of RFCs to UTF-8.


1.  Introduction

   The purpose of this document is to specify one possible way for the
   IETF to use non-ASCII characters in RFCs.  It does not advocate that



Hoffman                   Expires June 4, 2006                  [Page 1]

Internet-Draft              Non-ASCII in RFCs              December 2005


   the IETF should actually make such a change; instead, if the IETF
   decides that it wants to make such a change, this document gives a
   very simple way to do so.  The author believes that the IETF is not
   going to make such a change any time soon because of the IETF's
   reflexive tendency to spend huge amounts of time debating process
   issues that are actually quite simple.  Further, the RFC series is
   extremely important to the IETF and the Internet at large, so any
   change to the way RFCs are published tends to cause even more concern
   than "normal" IETF process issues.

   This document specifies only how to change RFCs to allow non-ASCII
   characters; it does not talk about changing Internet Drafts in a
   similar fashion.  The reason for doing this on one document series at
   a time is because the IETF, when it does deal with changing its
   process, does so better when it tries to change only one part of the
   process at a time.  Because Internet Drafts are handled in a very
   different process than RFCs, internationalizing Internet Drafts
   should be done in a separate change to minimize IETF process
   overload.

   This document discusses a change to RFCs that does not require a
   different text format for the RFC series.  That is, it does not
   require a change in the base format to HTML, XML, SVG, or ASN.1.
   Similarly, this document does not require that there be multiple
   authoritative versions, or multiple alternative representations, of a
   particular RFC.

   This document has absolutely nothing to do with whether or not RFCs
   should have a different format to allow graphics.

1.1.  Reasons to allow non-ASCII characters in RFCs

   Various guideline documents in the IETF, notably [RFC2223], specify
   that RFCs must use only the US-ASCII character set.  This restriction
   has historically caused many problems, notably:

   o  Names and addresses of authors of IETF documents are misspelled

   o  Names and document titles in references are misspelled

   o  Protocol examples that show non-ASCII characters cannot be shown
      directly

   The first two issues cause real problems for people searching for
   RFCs for particular authors or references that contain non-ASCII
   characters.  For many languages that use Latin characters outside the
   ASCII range, there are not absolute mappings between those non-ASCII
   characters and ASCII equivalents.  A common example is that "u-with-



Hoffman                   Expires June 4, 2006                  [Page 2]

Internet-Draft              Non-ASCII in RFCs              December 2005


   umlaut" may be mapped to "u" or to "ue"; many other mapping
   difficulties exist.

   Now that UTF-8 [RFC3629] is nearly universally available in text-
   editing and display systems, the IETF can eliminate these problems by
   changing RFCs to use UTF-8, if the IETF wants to change the content
   of RFCs from being all-ASCII.


2.  Use of UTF-8 in RFCs

   Upon publication of this document as an RFC, all RFCs will be
   considered to be encoded in UTF-8.  The the RFC Editor needs to
   change their processes to publish documents that are valid UTF-8.

   Note that the change described in this document only applies to RFCs.
   Internet Drafts retain their restriction to US-ASCII.  This means
   that, during the RFC preparation phase, document authors can ask the
   RFC Editor to change the spelling of some parts of the Internet Draft
   from which the RFC Editor is preparing the final RFC.

   It is suggested that the RFC Editor limit non-ASCII characters to the
   following:

   o  Names and addresses of authors, used at the top of RFCs and in the
      author contact section

   o  Names and document titles used in the References sections

   o  Quotations from non-English languages

   o  Protocol examples that show non-ASCII characters, such as when
      showing internationalized domain names (IDNs) and
      internationalized resource identifiers (IRIs)

   The RFC Editor should determine in an expedient manner which
   characters are acceptable in RFCs.  For example, the RFC Editor might
   exclude some control characters because they could affect automatic
   processing of RFCs, but they might also allow them.  The RFC Editor
   should publish one or more RFCs with a variety of non-ASCII
   characters to help determine which characters, if any, will be
   problematic for processing.


3.  Security considerations

   A display program that expects only US-ASCII input may fail when it
   encounters octets outside the US-ASCII range of values.  Such a



Hoffman                   Expires June 4, 2006                  [Page 3]

Internet-Draft              Non-ASCII in RFCs              December 2005


   failure may become a security issue.  For example, the program may
   display incorrect results for the input.  More seriously, the program
   may have an internal error that causes it to fail in a security-
   compromising fashion.


4.  IANA considerations

   This document does not change or create any IANA-registered values.


5.  Informative References

   [RFC2223]  Postel, J. and J. Reynolds, "Instructions to RFC Authors",
              RFC 2223, October 1997.

   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
              10646", STD 63, RFC 3629, November 2003.


Appendix A.  Arguments against changing to UTF-8

   Over the past decade, the question of changing the encoding of RFCs
   to UTF-8 has come up repeatedly.  Although many people wanted the
   change, various people had different reasons why they felt it was a
   bad idea.  This appendix is a summary of those arguments and an
   explanation of why they are no longer as critical as they were a
   decade ago.

A.1.  Difficulty in displaying

   Some text display systems only know how to display US-ASCII.
   Displaying an RFC that uses non-ASCII characters encoded in UTF-8
   will cause those characters to be unreadable.

   There are, of course, still such display systems, and there always
   will be.  However, the number is dwindling as more software is
   improved to display non-ASCII characters and, in particular, to read
   UTF-8 as an encoding.  Of the systems that can only render US-ASCII,
   only a small subset drop non-ASCII characters: the others show an
   incorrect character in its place.  Thus, the person using such a
   system can often see that there is a problem, and can possibly choose
   to get better display software.

A.2.  Difficulty in printing

   Some printers can only print a limited set of characters due to the
   fact that they are character-oriented, not graphical.  Such printers



Hoffman                   Expires June 4, 2006                  [Page 4]

Internet-Draft              Non-ASCII in RFCs              December 2005


   inherently cannot print characters they do not understand.  Almost
   all such printers print the ASCII characters just fine.

   There are, of course, still such printers, and there always will be.
   However, the number is dwindling as older printers are replaced with
   ones that can print graphics so that now-common text features like
   boldface and italics can be printed.

A.3.  Insufficient fonts

   Almost no display system that can display text that is encoded with
   UTF-8 can display every character in the Unicode repertoire.  Thus,
   some non-ASCII characters that are included in RFCs will not display
   properly.

   Virtually every system that can display Unicode knows how to
   substitute a replacement character for ones that cannot be displayed.
   In fact, most such systems have glyphs for rendering unknown
   characters and different glyphs for rendering known characters for
   which the system has no font.

A.4.  Inability to search for non-ASCII characers

   If authors start using non-ASCII characters in their names and/or
   addresses, people who know the characters but are unfamiliar with the
   user interface on their computers may not be able to enter those
   characters in the search criteria.  For example, some people do not
   know how to enter "u-with-umlaut" in their operating system, even
   though the operating system allows such input.

   This is a valid concern, but one that is orthogonal to whether or not
   RFCs should use these characters.  The alternative (never go to
   UTF-8) simply shifts the problem to forcing the user to guess which
   ASCII-only spelling to use when searching.

A.5.  Normalization

   Due to the way that Unicode uses combining characters, there are
   sometimes multiple ways to spell the same character.  For example,
   the character "lowercase-a-with-accent" can be spelled in two ways:
   as a single character (U+00E1) or as two characters (U+0061 followed
   by U+0301).  Thus, searching for this character can be ambiguous.

   Although there are multiple ways to spell some characters, almost all
   such characters have a shortest form that can be found using the
   Unicode normalization rules.  The RFC Editor should use only
   normalized strings in RFCs.




Hoffman                   Expires June 4, 2006                  [Page 5]

Internet-Draft              Non-ASCII in RFCs              December 2005


Author's Address

   Paul Hoffman
   VPN Consortium
   127 Segre Place
   Santa Cruz, CA 95060
   USA

   Email: paul.hoffman@vpnc.org


Full Copyright Statement

   Copyright (C) The Internet Society (2005).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary



Hoffman                   Expires June 4, 2006                  [Page 6]

Internet-Draft              Non-ASCII in RFCs              December 2005


   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.










































Hoffman                   Expires June 4, 2006                  [Page 7]



PAFTECH AB 2003-20262026-04-24 13:53:52