One document matched: draft-hoffman-idna2-02.txt

Differences from draft-hoffman-idna2-01.txt




Network Working Group                                         P. Hoffman
Internet-Draft                                             March 4, 2009
Updates: RFC 3454, 3490, 3491
(if approved)
Intended status: Standards Track
Expires: September 5, 2009


    Internationalizing Domain Names in Applications (IDNA) version 2
                       draft-hoffman-idna2-02.txt

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.  This document may contain material
   from IETF Documents or IETF Contributions published or made publicly
   available before November 10, 2008.  The person(s) controlling the
   copyright in some of this material may not have granted the IETF
   Trust the right to allow modifications of such material outside the
   IETF Standards Process.  Without obtaining an adequate license from
   the person(s) controlling the copyright in such materials, this
   document may not be modified outside the IETF Standards Process, and
   derivative works of it may not be created outside the IETF Standards
   Process, except to format it for publication as an RFC or to
   translate it into languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 5, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.



Hoffman                 Expires September 5, 2009               [Page 1]

Internet-Draft                    IDNA2                       March 2009


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   IDNA has been a world-wide success since it was introduced over five
   years ago.  However, it has some notable deficiencies, including
   being tied to an old version of the Unicode standard and needless
   restrictions that prevented some languages from being used.  This
   document describes IDNA version 2, which rectifies those problems
   while making the fewest changes necessary to the original protocol.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
     1.1.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . 4
     1.2.  Conventions Used In This Document . . . . . . . . . . . . . 4
   2.  Changes to RFC 3490 (IDNA v.1)  . . . . . . . . . . . . . . . . 4
   3.  Changes to RFC 3454 (Stringprep)  . . . . . . . . . . . . . . . 4
   4.  Changes to RFC 3491 (Nameprep)  . . . . . . . . . . . . . . . . 6
   5.  Changes to RFC 3492 (Punycode)  . . . . . . . . . . . . . . . . 7
   6.  Suggestions for Registries  . . . . . . . . . . . . . . . . . . 7
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
   8.  Security Considerations . . . . . . . . . . . . . . . . . . . . 7
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 7
     9.1.  Normative References  . . . . . . . . . . . . . . . . . . . 7
     9.2.  Informative References  . . . . . . . . . . . . . . . . . . 8
   Appendix A.  Work Still to be Done  . . . . . . . . . . . . . . . . 8
   Appendix B.  Changes between versions . . . . . . . . . . . . . . . 8
     B.1.  Changes between the -00 and -01 drafts  . . . . . . . . . . 8
     B.2.  Changes between the -01 and -02 drafts  . . . . . . . . . . 9
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 9















Hoffman                 Expires September 5, 2009               [Page 2]

Internet-Draft                    IDNA2                       March 2009


1.  Introduction

   This document describes Internationalizing Domain Names in
   Applications (IDNA) version 2 (hereafter called "IDNAv2"), a direct
   update to IDNA (hereafter called "IDNAv1").  IDNAv1 consists of four
   RFCs:
   o  [RFC3490], "Internationalizing Domain Names in Applications
      (IDNA)", is the main definition of IDNAv1.  This defines the
      processing rules for IDNA and gives the background for how IDNA
      works.
   o  [RFC3454], "Preparation of Internationalized Strings
      ("stringprep")", defines the general framework for processing non-
      ASCII strings that are used in IDNA.
   o  [RFC3491], "Nameprep: A Stringprep Profile for Internationalized
      Domain Names (IDN)", is a short profile of the rules from the
      stringprep framework.
   o  [RFC3492], "Punycode: A Bootstring encoding of Unicode for
      Internationalized Domain Names in Applications (IDNA)", defines
      the encoding used in IDNAv1 labels.

   IDNAv2 is backwards-compatible with IDNv1, meaning that any DNS label
   that was legal in IDNAv1 has exactly the same representation in
   IDNAv2.  New labels are allowed in IDNAv2 that were not allowed in
   IDNAv1.

   IDNA needs to be updated for many reasons, some of which are covered
   in [RFC4690].  If for no other reason, many characters that could
   appear in domain names have been added since Unicode version 3.2
   [UNICODE32], which is the version of the Unicode Standard on which
   IDNAv1 is based.

   One explicit goal of this update is to allow labels with characters
   that have been added since Unicode version 3.2 to be used in IDNA.
   To that end, IDNAv2 is based on Unicode 5.1 [UNICODE51].  The tables
   in stringprep and Nameprep are updated to reflect this change.

   Another explicit goal of this update is to not change the encoding of
   any label that is legal in IDNAv1.  If an internationalized label in
   IDNAv1 produces an ACE label, IDNAv2 must produce the same ACE label.
   If an internationalized label in IDNAv1 produces an ASCII label,
   IDNAv2 must produce the same ASCII label.

   A third explicit goal is to update the bidirectional ("bidi")
   algorithm used by IDNAv1 to cover more languages such as Dhivehi and
   Yiddish.  This is done to cover an oversight in IDNAv1 that was
   discovered after the work was finished.

   This document updates IDNAv1 to reflect Unicode version 5.1.  Of



Hoffman                 Expires September 5, 2009               [Page 3]

Internet-Draft                    IDNA2                       March 2009


   course, the Unicode Consortium will not stop at Unicode version 5.1.
   Because of that, IDNAv2 will probably later need to be updated to
   reflect newer versions of Unicode.

1.1.  Acknowledgements

   The first serious work on updating IDNAv1 was undertaken by John
   Klensin, Patrik Faltstrom, Harald Alvestrand, and Cary Karp.  It led
   to the formation of the IDNAbis Working Group in the IETF, and they
   produced many revisions of their documents in that WG.  Some of the
   ideas in this IDNAv2 document (most notably, the update to the bidi
   algorithm) is derived from their efforts.

   Many, many people worked on IDNAv1.  In addition to the authors of
   the standards (Marc Blanchet, Adam Costello, Patrik Faltstrom, and
   me), there were literally dozens of active participants in the
   original IDN Working Group in the IETF that began in 2000.  Their
   tireless effort led to IDNAv1.

1.2.  Conventions Used In This Document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   In sections of this document where changes are made to RFCs, those
   changes are shown with a vertical line character ("|") in the first
   column.


2.  Changes to RFC 3490 (IDNA v.1)

   All references to the Unicode Standard are updated to refer to
   [UNICODE51].

   All references to Nameprep are updated to refer to the Nameprep in
   this document.  Similarly, all references to stringprep are updated
   to refer to the stringprep in this document.

   In section 3.1, the first bullet point ("1) Whenever dots are
   used...") is changed to add the following at the end of the sentence:
   "U+2CFE (Coptic full stop)".


3.  Changes to RFC 3454 (Stringprep)

   [[[ ============================================================




Hoffman                 Expires September 5, 2009               [Page 4]

Internet-Draft                    IDNA2                       March 2009


   NOTE FOR EARLY VERSIONS OF THIS DRAFT

   This section is intentionally incomplete.  The tables in Stringprep
   need to be added to based on the characters added to the repertoire
   after Unicode 3.2 up to and including Unicode 5.1.

   Probably the best way for this to be done is a few dedicated
   individuals go through the new characters one-by-one, and also to go
   through them programmatically, and see which tables need to be added
   to.  I have done a first pass of doing this one-by-one, but I felt
   that publishing my results in the first draft would cause others to
   get lazy about this important task.  Future versions of this document
   will reflect the results of that work.

   The character review will be similar to what we did in IDNAv1, except
   that we don't have to create any new buckets.  Basically, we have to
   see whether a particular new character should be mapped to nothing,
   or whether it should be prohibited for one of the reasons already
   listed in RFC 3454.  In my not-careful first pass, I found very few
   characters that will need to be added to sections 3 or 5.  The case-
   mapping will happen algorithmically, with a check that the new map
   does not change any value in the old map.

   ============================================================ ]]]

   This document is significantly revised to reflect the use of Unicode
   version 5.1.  All the substantiative changes are additions.  There
   has been no effort to "correct" perceived mistakes in RFC 3454.  (One
   can argue that the extending of the bidi rules in section 6 to allow
   more languages to be expressed is such a correction; however, the
   change lets more strings to be allowed, and doesn't cause any string
   that was allowed in RFC 3454 to not be allowed in the new version.)

   Most of the changes to RFC 3454 are to add characters to the tables
   in the document.  These characters come from Unicode version 5.1.
   Thus, the tables become valid for Unicode version 5.1.  However, the
   same tables are still valid for Unicode version 3.2 because a profile
   that is still using version 3.2 will not ever use the added rows in
   the updated tables.

   In all places other than Appendix A, references to "[Unicode3.2]" are
   updated to refer to [UNICODE51].  Similarly, all text references to
   "Unicode version 3.2" are updated to "Unicode version 5.1".

   Characters will be added to the tables in section 3.1 to reflect the
   differences between Unicode 3.2 and Unicode 5.1.  For example,
   U+E0100 to U+E01EF will be added to the second list in the section.




Hoffman                 Expires September 5, 2009               [Page 5]

Internet-Draft                    IDNA2                       March 2009


   In section 3.2, change "CaseFolding-3.txt" to "CaseFolding.txt".

   Characters will be added to the tables in subsections of section 5.
   An example is that U+2064 will be added to the list in section 5.2.

   In section 6, at the end of the fourth paragraph (which currently
   ends with "have bidirectional category "EN"."), the following
   sentence is added: "The Unicode Standard also defines a bidirectional
   category "NSM" for "non-spacing marks"."

   In section 6, the third requirement is changed to read:

   | 3) If a string contains any RandALCat character, the first
   |   character MUST be a RandALCat chacter, and the last
   |   characters of the string must be either a RandALCat
   |   character or a RandALCat character followed by one or
   |   more NSM charcters.

   In the references, update the reference for UAX15, and add a
   reference for [UNICODE51].

   Appendix A is changed to read:

   | The following is the only repertoire covered in this document:
   |
   | - Unicode 3.2, as defined in [UNICODE32]
   |
   | - Unicode 5.1, as defined in [UNICODE51]

   A new appendix, "A.2 Unassigned code points in Unicode 5.1", will be
   added.

   The tables in appendixes B, C, and D will be added to.


4.  Changes to RFC 3491 (Nameprep)

   All references to IDNA and stringprep are updated to refer to the
   stringprep in this document.

   In section 1 and 2, "Unicode 3.2" is changed to "Unicode 5.1".

   In section 10, change the last table entry to "This is the second
   version of Nameprep."







Hoffman                 Expires September 5, 2009               [Page 6]

Internet-Draft                    IDNA2                       March 2009


5.  Changes to RFC 3492 (Punycode)

   IDNAv2 does not change RFC 3492.


6.  Suggestions for Registries

   This is a placeholder for a short section that covers new advice for
   registries that was not included in IDNAv1.  It will include ideas
   about multi-script labels and possibly other advice.


7.  IANA Considerations

   IANA is requested to add the following to the stringprep profile
   registry (www.iana.org/assignments/stringprep-profiles).

   Name of this profile: Nameprep

   RFC in which the profile is defined: This document.

   Indicator whether or not this is the newest version of the profile:
   This is the second version of Nameprep.


8.  Security Considerations

   The security considerations from RFCs 3454, 3490, 3491, and 3492 all
   apply to this document.  The changes between IDNAv1 and IDNAv2 are
   not believed to add any new security considerations.


9.  References

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
              Internationalized Strings ("stringprep")", RFC 3454,
              December 2002.

   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep



Hoffman                 Expires September 5, 2009               [Page 7]

Internet-Draft                    IDNA2                       March 2009


              Profile for Internationalized Domain Names (IDN)",
              RFC 3491, March 2003.

   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
              for Internationalized Domain Names in Applications
              (IDNA)", RFC 3492, March 2003.

   [UNICODE32]
              The Unicode Consortium, "The Unicode Standard, Version
              3.2", The Unicode Standard version 3.2.

   [UNICODE51]
              The Unicode Consortium, "The Unicode Standard, Version
              5.1", The Unicode Standard version 5.1.

9.2.  Informative References

   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
              Recommendations for Internationalized Domain Names
              (IDNs)", RFC 4690, September 2006.


Appendix A.  Work Still to be Done

   Figure out exactly how we want the reference to Unicode 3.2 and
   Unicode 5.1 to look in the references section, then figure out how to
   wrestle xml2rfc to produce that.

   Fill in all the tables for the updates to stringprep.

   Decide if this entire document should be about Unicode 5.2, which is
   expected out by mid-2009.


Appendix B.  Changes between versions

   (This section is to be removed by the RFC Editor.)

B.1.  Changes between the -00 and -01 drafts

   In section 1, changed the target for backwards-compatibility to be
   for strings that have only visible characters.

   In section 3, removed the first paragraph.

   In section 3 (about Stringprep section 3.1), added the text about
   removing U+200C and U+200D from the mapped-to-nothing list.




Hoffman                 Expires September 5, 2009               [Page 8]

Internet-Draft                    IDNA2                       March 2009


   In section 3 (about Stringprep section 6), replaced:

   | 3) If a string contains any RandALCat character, a RandALCat
   |   character MUST be the first character of the string, and
   |   either a RandALCat character or NSM charcter MUST be the
   |   last character of the string.

   with

   | 3) If a string contains any RandALCat character, the first
   |   character MUST be a RandALCat chacter, and the last
   |   characters of the string must be either a RandALCat
   |   character or a RandALCat character followed by one or
   |   more NSM charcters.

   Added new placeholder section 6 on advice to registries.

   In Appendix A, added the thought about targeting Unicode 5.2 instead
   of Unicode 5.1.

B.2.  Changes between the -01 and -02 drafts

   Reversed the changes made in -01 with respect to U+200C and U+200D.

   Added paragraph at the end of section 1 acknowledging that IDNAv2
   will eventually need to be updated as well.


Author's Address

   Paul Hoffman

   Email: phoffman@imc.org


















Hoffman                 Expires September 5, 2009               [Page 9]


PAFTECH AB 2003-20242024-03-28 21:24:55