One document matched: draft-saintandre-xmpp-i18n-02.txt

Differences from draft-saintandre-xmpp-i18n-01.txt




Network Working Group                                     P. Saint-Andre
Internet-Draft                                                     Cisco
Intended status: Informational                          October 22, 2010
Expires: April 25, 2011


                  Internationalized Addresses in XMPP
                     draft-saintandre-xmpp-i18n-02

Abstract

   The Extensible Messaging and Presence Protocol (XMPP) as defined in
   RFC 3920 used stringprep in the preparation and comparison of non-
   ASCII characters within XMPP addresses.  This document explores
   whether it makes sense to move away from the use of stringprep in
   XMPP.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 25, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Saint-Andre              Expires April 25, 2011                 [Page 1]

Internet-Draft                  XMPP I18N                   October 2010


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Characteristics and Uses of XMPP Addresses . . . . . . . . . .  4
   3.  String Classes . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1.  Localpart  . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.2.  Resourcepart . . . . . . . . . . . . . . . . . . . . . . .  7
   4.  Migration Issues . . . . . . . . . . . . . . . . . . . . . . .  7
   5.  User Interface Issues  . . . . . . . . . . . . . . . . . . . .  8
   6.  Recommendations  . . . . . . . . . . . . . . . . . . . . . . .  8
     6.1.  Possible Approaches  . . . . . . . . . . . . . . . . . . .  8
     6.2.  Domainpart . . . . . . . . . . . . . . . . . . . . . . . .  9
     6.3.  Localpart  . . . . . . . . . . . . . . . . . . . . . . . .  9
     6.4.  Resourcepart . . . . . . . . . . . . . . . . . . . . . . .  9
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
   9.  Informative References . . . . . . . . . . . . . . . . . . . .  9
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11

































Saint-Andre              Expires April 25, 2011                 [Page 2]

Internet-Draft                  XMPP I18N                   October 2010


1.  Introduction

   The Extensible Messaging and Presence Protocol [XMPP] is a widely-
   deployed technology for real-time communication, commonly used for
   instant messaging (IM) among human users but also for communication
   among automated systems.  XMPP addresses (also called "JabberIDs" or
   JIDs) are of the form <localpart@domainpart/resourcepart>, where the
   localpart and resourcepart are formally optional but quite common
   because they are used to identify clients and other entities on the
   network.  In some sense, XMPP addresses have always been
   internationalized, because the developers of the original Jabber
   open-source project intended that all data sent over the wire would
   consist of UTF-8 encoded Unicode code points.  However, at that time
   (1999) the Jabber developers were quite unsophisticated about
   internationalization, nor they could not simply re-use a reliable
   internationalization technology that had been developed by the wider
   Internet community (as they could, for example, by re-using Secure
   Sockets Layer and Transport Layer Security for channel encryption);
   this lack of sophistication is evident in the community's first
   attempt at formally defining the format for JabberIDs in early 2002
   [XEP-0029].  When the first instantiation of the IETF's XMPP WG was
   formed in late 2002, IDNA2003 [RFC3490] had not yet been published
   and stringprep [RFC3454] was a very new technology.  During its work
   on [RFC3920], the XMPP WG absorbed as best it could the advice of
   internationalization experts regarding appropriate methods for
   preparing and comparing XMPP addresses; however, the participants in
   the XMPP WG did not possess very much knowledge of
   internationalization and therefore did not necessarily make fully-
   informed decisions.  As a result of this early work, in [RFC3920] the
   XMPP WG decided to re-use IDNA2003 [RFC3490] and Nameprep [RFC3491]
   for the domainpart of a JID and to define two additional stringprep
   profiles: Nodeprep for the localpart and Resourceprep for the
   resourecepart.

   Since the publication of [RFC3920] in 2004, the Internet community
   has gained more experience with internationalization.  In particular,
   IDNA2003, which is based on stringprep, has been superseded by
   IDNA2008 ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]),
   which does not use stringprep.  This migration away from stringprep
   for internationalized domain names has prompted other "customers" of
   stringprep to consider new approaches to the preparation and
   comparision of internationalized addresses.  As a result, the IETF
   has formed the PRECIS WG as a common forum for seeking solutions to
   the problem statement outlined in [PROBLEM].  This document has two
   purposes: (1) provide input to the PRECIS WG and (2) help inform the
   decisions of the XMPP WG regarding internationalization of XMPP
   addresses, eventually leading to replacement of [XMPP-ADDR].




Saint-Andre              Expires April 25, 2011                 [Page 3]

Internet-Draft                  XMPP I18N                   October 2010


2.  Characteristics and Uses of XMPP Addresses

   As mentioned, XMPP addresses are of the form
   <localpart@domainpart/resourcepart>.  For the domainpart, it makes
   sense for XMPP to simply re-use the most up-to-date technology for
   internationalized domain names, which [RFC3920] did by re-using
   [RFC3490].  Naturally, any migration from IDNA2003 to IDNA2008 will
   introduce migration issues as outlined under Section 4, but those
   issues need to be overcome so that XMPP technologies can follow best
   current practices for internationalization of domain names.

   However, just because XMPP re-uses IDNA2008 does not necessarily
   imply that the underlying "inclusion approach" taken in IDNA2008 can
   also be applied directly to the localpart and resourcepart of an XMPP
   address.  To understand whether a new approach makes sense, we need
   to understand the uses and characteristics of XMPP addresses (and the
   parts thereof).

   The inclusion approach used in IDNA2008 makes sense because domain
   names were always limited to the letter-digits-hyphen ("LDH")
   pattern; the progression to non-ASCII domain names simply introduced
   more characters that might qualify as letters and (in some cases)
   digits.  Extrapolating from that pattern, [RFC5894] argues that there
   is no good reason for a domain name to include characters such as
   symbols (e.g., hearts and stars), since the purpose of a domain name
   is to provide an unambiguous, memorable label for identifying and
   referring to resources on the Internet, not a personally expressive
   "handle" or a fun "tag" for interaction.

   The localpart and resourcepart of a JID often serve purposes other
   than unambiguous, memorable labels.  For example, a human user of an
   XMPP-based IM system might expect that the username (localpart)
   portion of a JID could be expressive of their identity in some way,
   e.g. by matching some combination of their given name, surname, or
   nickname.  Similarly, an occupant of an XMPP-based chatroom
   [XEP-0045] might expect that their in-room nickname (resourcepart)
   could be a fun conversation-starter; for example, a regular visitor
   to an XMPP chatroom that the author frequents has an in-room nickname
   of "The King" where "King" is represented by the Unicode codepoint
   'BLACK CHESS KING' (U+265A).  Such characters might difficult to
   communicate in some contexts (e.g., in screen readers for the blind),
   but are expressive and fun, which is not an unimportant consideration
   for many IM users -- even at the expense of reliability.

   Does the desire for an expressive username or nickname trump the need
   for human-readable identifiers?  Given the wide implementation of
   full-Unicode addresses in human-oriented XMPP applications, IM client
   developers seem to think so.



Saint-Andre              Expires April 25, 2011                 [Page 4]

Internet-Draft                  XMPP I18N                   October 2010


   These admittedly anecdotal and subjective considerations vaguely
   indicate that the inclusion approach pursued in the IDNA2008
   initiative is quite appropriate for the more restricted class of
   domain names but perhaps not as appropriate for the localpart or
   resourcepart of an XMPP address.

   That being said, some XMPP implementations (e.g., a custom client) or
   deployments (e.g., an IM system at a large enterprise or branch of
   the military) might wish to "lock down" the expressive potential of
   XMPP addresses by limiting provisioned addresses to a particular
   subset or version of Unicode, by specifying which scripts, languages,
   code points, and text directions are supported, etc.  Currently there
   is no way for an implementation or deployment to do so in
   standardized manner that can be communicated to other entities on the
   network (e.g., during account provisioning).  Given that a deployed
   XMPP service acts in some ways like a registrar does for domain
   names, such methods might be helpful; although they are out of scope
   for the XMPP WG, they might be considered by the XMPP Standards
   Foundation (e.g., in revisions to or a replacement for [XEP-0077]).


3.  String Classes

   Both [PROBLEM] and [FRAMEWORK] propose that it might be valuable to
   think of internationalized addresses in terms of broad "string
   classes" such as domain name, email address, restricted identifier,
   less-restrictive identifier, and perhaps even free-form identifier
   (just about anything goes).  Particular technologies like XMPP could
   either borrow such a string class unchanged (as we do for domain
   names) or adapt or "profile" such a string class with modifications
   (e.g., as could possibly do by profiling the email address class,
   restricted identifier class, or less-restrictive for localparts and a
   possible free-form identifier class for resourceparts).

   This document does not yet make recommendations about borrowing or
   adapting more general string classes, in part because those classes
   are not yet clearly defined.  However, as input to further
   discussion, this document explores the two string classes for which
   [RFC3920] defined new stringprep profiles: localparts and
   resourceparts.  The following subsections refer to the properties
   described in Section 3 of [PROBLEM] (input restrictions,
   normalization, case mapping, and bidirectionality).

3.1.  Localpart

   The localpart of an XMPP address is joined with a domainpart (via the
   '@' separator) in a way that on the surface looks like an email
   address, such as <juliet@example.com>.  However, there are some



Saint-Andre              Expires April 25, 2011                 [Page 5]

Internet-Draft                  XMPP I18N                   October 2010


   subtle differences, even if we assume that the username portion of an
   email address inherits from the "dot-atom-text" rule of [RFC5322]
   instead of the more complex "local-part" rule.  Specifically, within
   the ASCII block:

   o  Several characters are allowed in email addresses but not allowed
      in JabberIDs: "&" (U+0026), "'" (U+0027), and "/" (U+002F).
   o  Several characters are allowed in JabberIDs but not allowed in
      email addresses: "(" (U+0028), ")" (U+0029), "," (U+002C), "."
      (U+002E), ";" (U+003B), "[" (U+005B), "\" (U+005C), and "]"
      (U+005D).

   Those differences might not be significant enough to prevent the XMPP
   WG from adapting or "profiling" an email address class if the PRECIS
   WG produces such a class.  On the other hand, they might lead the
   XMPP WG toward borrowing or adapting either the restricted identifier
   class or the less-restrictive identifier class, depending on how
   those are defined.

   With regard to input restrictions, the characters allowed in an XMPP
   localpart have always been lightly restricted.  Within the ASCII
   block, the only restricted characters are space, controls, "
   (U+0022), & (U+0026), ' (U+0027), / (U+002F), : (U+003A), < (U+003C),
   > (U+003E), and @ (U+0040).  Outside the ASCII block, no characters
   are currently restricted.  It is an open issue whether further
   restrictions are desirable (e.g., do XMPP localparts really need to
   include symbol characters such as hearts and stars?).

   With regard to normalization, the Nodeprep profile of stringprep
   specifies that implementations apply Unicode normalization form NFKC
   (Compatibility Decomposition followed by Canonical Composition).  As
   briefly described in Section 2.4 of [PROBLEM], it is an open question
   whether it is more appropriate to apply Unicode normalization form
   NFKC, form NFC (Canonical Decomposition followed by Canonical
   Composition), or no normalization at all.  These forms are defined in
   "Unicode Standard Annex #15: Unicode Normalization Forms" [UAX15],
   along with several examples of the differing outputs they can
   produce.  As two examples, for the source code point U+FB01 (SMALL
   LATIN LIGATURE FI) NFC produces that same code point whereas NFKC
   produces "f" followed by "i" (U+0066 and U+0069), and for the source
   code points U+0032 (DIGIT TWO) and U+2075 (SUPERSCRIPT FIVE) NFC
   produces those same code points whereas NFKC produces U+0032 (DIGIT
   TWO) and U+0035 (DIGIT FIVE).  Very informally, XMPP developers can
   think of NFKC as trying to be smart -- and perhaps sometimes too
   smart.

   With regard to case mapping, the Nodeprep profile of stringprep
   specifies that XMPP localparts are case-folded, and we want to retain



Saint-Andre              Expires April 25, 2011                 [Page 6]

Internet-Draft                  XMPP I18N                   October 2010


   that feature (e.g., we want <juliet@example.com> and
   <Juliet@example.com> to identify the same entity on the network).

   With regard to bidirectionality (i.e., scripts that are written right
   to left), [RFC3920] did not provide any guidance other than pointing
   to Section 6 of [RFC3454].  Any treatment of bidirectionality in XMPP
   localparts is an open issue ([RFC5893] provides some helpful
   discussion of the general topic, at least as applied to
   internationalized domain names).

3.2.  Resourcepart

   The resourcepart of an XMPP address has traditionally been a kind of
   "anything goes" string, even allowing the space character.  If the
   PRECIS WG defines something like a free-form identifier, the XMPP WG
   might borrow or adapt that class.  Another option would be to say
   that the resourcepart is Net-Unicode as specified in [RFC5198].

   With regard to input restrictions, the characters allowed in an XMPP
   resourcepart have always been lightly restricted.  Within the ASCII
   block, the only restricted characters are controls.  Outside the
   ASCII block, no characters are currently restricted.  Although is an
   open issue whether further restrictions are desirable, as explained
   under Section 2 XMPP-based IM systems have taken advantage of the
   lack of restrictions on resource identifiers (e.g., in multi-user
   chatrooms).

   With regard to normalization, the Resourceprep profile of stringprep
   specifies that implementations apply Unicode normalization form NFKC
   (Compatibility Decomposition followed by Canonical Composition).

   With regard to case mapping, the Resourceprep profile of stringprep
   specifies that XMPP localparts are not case-folded (e.g., in an XMPP-
   based chatroom, the participant "StPeter" could be different from the
   participant "stpeter").  It is an open question whether this behavior
   is necessary or desirable in all contexts.

   With regard to bidirectionality (i.e., scripts that are written right
   to left), [RFC3920] did not provide any guidance.  Any treatment of
   bidirectionality in XMPP resourceparts is an open issue ([RFC5893]
   provides some helpful discussion of the general topic, at least as
   applied to internationalized domain names).


4.  Migration Issues

   Any move away from Nameprep, Nodeprep, and Resourceprep as they are
   defined today will inevitably introduce the potential for migration



Saint-Andre              Expires April 25, 2011                 [Page 7]

Internet-Draft                  XMPP I18N                   October 2010


   issues, such as JIDs that were not ambiguous before the migration but
   that become ambiguous after the migration.  These issues need to be
   clearly defined and well understood so that the costs and benefits of
   any change can be properly assessed -- especially if the change might
   have an impact on authentication (e.g., as described in [RFC3920]),
   authorization (e.g., presence subscriptions as described in
   [XMPP-IM]), access (e.g., joining a chatroom as described in
   [XEP-0045]), identification (e.g., in XMPP URIs or IRIs as described
   in [XMPP-URI]), and other security-related functions.


5.  User Interface Issues

   [RFC5895] introduces the helpful concept of "the dividing line
   between user interface and protocol" and applies that concept to the
   complexs process of translating the user's (presumed) intentions into
   bits on the wire.  IDNA2003 conflated user interface processing and
   machine-readable protocols, and in many ways XMPP inherited that same
   error.  It would be desirable for XMPP technologies to define a clear
   dividing line between user interface and protocol.  This might mean
   that the XMPP community will need to define recommended mappings that
   are applied to a string before it is considered a JID (or the
   localpart of resourcepart of a JID).


6.  Recommendations

6.1.  Possible Approaches

   This document does not yet provide definitive recommendations, but
   instead mainly seeks to foster discussion about internationalized
   addresses in XMPP.  However, there are three possible approaches that
   the XMPP WG might pursue in relation to its existing stringprep
   profiles:

   1.  Keep using Nameprep, Nodeprep, and/or Resourceprep as they are
       defined today.

   2.  Collaborate with other interested parties or working groups to
       define a new version of stringprep that tracks changes to Unicode
       since Unicode 3.2 as currently specified in [RFC3454].

   3.  Pursue the general model followed in the IDNA2008 work by
       defining a tiered model of valid, disallowed, and unassigned
       characters; such an effort might be pursued only within the XMPP
       community (for Nodeprep, Resourceprep, or both) or more generally
       in concert with other users of stringprep.




Saint-Andre              Expires April 25, 2011                 [Page 8]

Internet-Draft                  XMPP I18N                   October 2010


   The XMPP WG might even decide to use a mix of these approaches, e.g.
   to use the new, non-stringprep IDNA2008 approach for domainparts but
   the existing Nodeprep and Resourceprep profiles for localparts and
   resourceparts.

   In general, given that the PRECIS WG has been formed as a common
   effort across different technologies, it is reasonable for the XMPP
   developer community to participate in that WG (and for the XMPP WG to
   cooperate with that WG) and to adopt whatever solutions are developed
   in that WG.

6.2.  Domainpart

   RFC 3920 specifies the use of IDNA2003 for the domainpart of a JID
   (which in the terms of IDNA2008 [RFC5890] is a "domain name slot").
   This document does not question the reasoning behind the IDNA2008
   work and therefore recommends the use of IDNA2008 technologies in the
   document that obsoletes [XMPP-ADDR].

6.3.  Localpart

   This document does not yet provide a recommendation regarding the
   localpart of a JID.

6.4.  Resourcepart

   This document does not yet provide a recommendation regarding the
   resourcepart of a JID.


7.  Security Considerations

   The inclusion of non-ASCII characters in XMPP addresses has important
   security implications, such as the ability to mimic characters or
   entire addresses through the inclusion of "confusable characters"
   (see [RFC4690] and [RFC5890]).  These issues are explored at some
   length in [XMPP-ADDR].  Other security considerations might apply and
   will be described in a future version of this specification.


8.  IANA Considerations

   This document has no actions for the IANA.


9.  Informative References

   [FRAMEWORK]



Saint-Andre              Expires April 25, 2011                 [Page 9]

Internet-Draft                  XMPP I18N                   October 2010


              Blanchet, M., "Precis Framework: Handling
              Internationalized Strings in Protocols",
              draft-blanchet-precis-framework-00 (work in progress),
              July 2010.

   [PROBLEM]  Blanchet, M. and A. Sullivan, "Stringprep Revision Problem
              Statement", draft-ietf-precis-problem-statement-00 (work
              in progress), October 2010.

   [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
              Internationalized Strings ("stringprep")", RFC 3454,
              December 2002.

   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
              Profile for Internationalized Domain Names (IDN)",
              RFC 3491, March 2003.

   [RFC3920]  Saint-Andre, P., Ed., "Extensible Messaging and Presence
              Protocol (XMPP): Core", RFC 3920, October 2004.

   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
              Recommendations for Internationalized Domain Names
              (IDNs)", RFC 4690, September 2006.

   [RFC5198]  Klensin, J. and M. Padlipsky, "Unicode Format for Network
              Interchange", RFC 5198, March 2008.

   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
              October 2008.

   [RFC5890]  Klensin, J., "Internationalized Domain Names for
              Applications (IDNA): Definitions and Document Framework",
              RFC 5890, August 2010.

   [RFC5891]  Klensin, J., "Internationalized Domain Names in
              Applications (IDNA): Protocol", RFC 5891, August 2010.

   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
              Internationalized Domain Names for Applications (IDNA)",
              RFC 5892, August 2010.

   [RFC5893]  Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
              Internationalized Domain Names for Applications (IDNA)",
              RFC 5893, August 2010.



Saint-Andre              Expires April 25, 2011                [Page 10]

Internet-Draft                  XMPP I18N                   October 2010


   [RFC5894]  Klensin, J., "Internationalized Domain Names for
              Applications (IDNA): Background, Explanation, and
              Rationale", RFC 5894, August 2010.

   [RFC5895]  Resnick, P. and P. Hoffman, "Mapping Characters for
              Internationalized Domain Names in Applications (IDNA)
              2008", RFC 5895, September 2010.

   [UAX15]    The Unicode Consortium, "Unicode Standard Annex #15:
              Unicode Normalization Forms", September 2010.

   [XEP-0029]
              Kaes, C., "Definition of Jabber Identifiers (JIDs)", XSF
              XEP 0029, October 2003.

   [XEP-0077]
              Saint-Andre, P., "In-Band Registration", XSF XEP 0077,
              September 2009.

   [XEP-0045]
              Saint-Andre, P., "Multi-User Chat", XSF XEP 0045,
              July 2008.

   [XMPP-ADDR]
              Saint-Andre, P., "Extensible Messaging and Presence
              Protocol (XMPP): Address Format",
              draft-ietf-xmpp-address-05 (work in progress),
              October 2010.

   [XMPP-IM]  Saint-Andre, P., "Extensible Messaging and Presence
              Protocol (XMPP): Instant Messaging and Presence",
              draft-ietf-xmpp-3921bis-15 (work in progress),
              October 2010.

   [XMPP-URI]
              Saint-Andre, P., "Internationalized Resource Identifiers
              (IRIs) and Uniform Resource Identifiers (URIs) for the
              Extensible Messaging and Presence Protocol (XMPP)",
              RFC 5122, February 2008.

   [XMPP]     Saint-Andre, P., "Extensible Messaging and Presence
              Protocol (XMPP): Core", draft-ietf-xmpp-3920bis-17 (work
              in progress), October 2010.








Saint-Andre              Expires April 25, 2011                [Page 11]

Internet-Draft                  XMPP I18N                   October 2010


Author's Address

   Peter Saint-Andre
   Cisco
   1899 Wyknoop Street, Suite 600
   Denver, CO  80202
   USA

   Phone: +1-303-308-3282
   Email: psaintan@cisco.com









































Saint-Andre              Expires April 25, 2011                [Page 12]


PAFTECH AB 2003-20262026-04-24 01:16:28