One document matched: draft-saintandre-xmpp-i18n-02.txt
Differences from draft-saintandre-xmpp-i18n-01.txt
Network Working Group P. Saint-Andre
Internet-Draft Cisco
Intended status: Informational October 22, 2010
Expires: April 25, 2011
Internationalized Addresses in XMPP
draft-saintandre-xmpp-i18n-02
Abstract
The Extensible Messaging and Presence Protocol (XMPP) as defined in
RFC 3920 used stringprep in the preparation and comparison of non-
ASCII characters within XMPP addresses. This document explores
whether it makes sense to move away from the use of stringprep in
XMPP.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 25, 2011.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Saint-Andre Expires April 25, 2011 [Page 1]
Internet-Draft XMPP I18N October 2010
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Characteristics and Uses of XMPP Addresses . . . . . . . . . . 4
3. String Classes . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1. Localpart . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2. Resourcepart . . . . . . . . . . . . . . . . . . . . . . . 7
4. Migration Issues . . . . . . . . . . . . . . . . . . . . . . . 7
5. User Interface Issues . . . . . . . . . . . . . . . . . . . . 8
6. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 8
6.1. Possible Approaches . . . . . . . . . . . . . . . . . . . 8
6.2. Domainpart . . . . . . . . . . . . . . . . . . . . . . . . 9
6.3. Localpart . . . . . . . . . . . . . . . . . . . . . . . . 9
6.4. Resourcepart . . . . . . . . . . . . . . . . . . . . . . . 9
7. Security Considerations . . . . . . . . . . . . . . . . . . . 9
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
9. Informative References . . . . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11
Saint-Andre Expires April 25, 2011 [Page 2]
Internet-Draft XMPP I18N October 2010
1. Introduction
The Extensible Messaging and Presence Protocol [XMPP] is a widely-
deployed technology for real-time communication, commonly used for
instant messaging (IM) among human users but also for communication
among automated systems. XMPP addresses (also called "JabberIDs" or
JIDs) are of the form <localpart@domainpart/resourcepart>, where the
localpart and resourcepart are formally optional but quite common
because they are used to identify clients and other entities on the
network. In some sense, XMPP addresses have always been
internationalized, because the developers of the original Jabber
open-source project intended that all data sent over the wire would
consist of UTF-8 encoded Unicode code points. However, at that time
(1999) the Jabber developers were quite unsophisticated about
internationalization, nor they could not simply re-use a reliable
internationalization technology that had been developed by the wider
Internet community (as they could, for example, by re-using Secure
Sockets Layer and Transport Layer Security for channel encryption);
this lack of sophistication is evident in the community's first
attempt at formally defining the format for JabberIDs in early 2002
[XEP-0029]. When the first instantiation of the IETF's XMPP WG was
formed in late 2002, IDNA2003 [RFC3490] had not yet been published
and stringprep [RFC3454] was a very new technology. During its work
on [RFC3920], the XMPP WG absorbed as best it could the advice of
internationalization experts regarding appropriate methods for
preparing and comparing XMPP addresses; however, the participants in
the XMPP WG did not possess very much knowledge of
internationalization and therefore did not necessarily make fully-
informed decisions. As a result of this early work, in [RFC3920] the
XMPP WG decided to re-use IDNA2003 [RFC3490] and Nameprep [RFC3491]
for the domainpart of a JID and to define two additional stringprep
profiles: Nodeprep for the localpart and Resourceprep for the
resourecepart.
Since the publication of [RFC3920] in 2004, the Internet community
has gained more experience with internationalization. In particular,
IDNA2003, which is based on stringprep, has been superseded by
IDNA2008 ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]),
which does not use stringprep. This migration away from stringprep
for internationalized domain names has prompted other "customers" of
stringprep to consider new approaches to the preparation and
comparision of internationalized addresses. As a result, the IETF
has formed the PRECIS WG as a common forum for seeking solutions to
the problem statement outlined in [PROBLEM]. This document has two
purposes: (1) provide input to the PRECIS WG and (2) help inform the
decisions of the XMPP WG regarding internationalization of XMPP
addresses, eventually leading to replacement of [XMPP-ADDR].
Saint-Andre Expires April 25, 2011 [Page 3]
Internet-Draft XMPP I18N October 2010
2. Characteristics and Uses of XMPP Addresses
As mentioned, XMPP addresses are of the form
<localpart@domainpart/resourcepart>. For the domainpart, it makes
sense for XMPP to simply re-use the most up-to-date technology for
internationalized domain names, which [RFC3920] did by re-using
[RFC3490]. Naturally, any migration from IDNA2003 to IDNA2008 will
introduce migration issues as outlined under Section 4, but those
issues need to be overcome so that XMPP technologies can follow best
current practices for internationalization of domain names.
However, just because XMPP re-uses IDNA2008 does not necessarily
imply that the underlying "inclusion approach" taken in IDNA2008 can
also be applied directly to the localpart and resourcepart of an XMPP
address. To understand whether a new approach makes sense, we need
to understand the uses and characteristics of XMPP addresses (and the
parts thereof).
The inclusion approach used in IDNA2008 makes sense because domain
names were always limited to the letter-digits-hyphen ("LDH")
pattern; the progression to non-ASCII domain names simply introduced
more characters that might qualify as letters and (in some cases)
digits. Extrapolating from that pattern, [RFC5894] argues that there
is no good reason for a domain name to include characters such as
symbols (e.g., hearts and stars), since the purpose of a domain name
is to provide an unambiguous, memorable label for identifying and
referring to resources on the Internet, not a personally expressive
"handle" or a fun "tag" for interaction.
The localpart and resourcepart of a JID often serve purposes other
than unambiguous, memorable labels. For example, a human user of an
XMPP-based IM system might expect that the username (localpart)
portion of a JID could be expressive of their identity in some way,
e.g. by matching some combination of their given name, surname, or
nickname. Similarly, an occupant of an XMPP-based chatroom
[XEP-0045] might expect that their in-room nickname (resourcepart)
could be a fun conversation-starter; for example, a regular visitor
to an XMPP chatroom that the author frequents has an in-room nickname
of "The King" where "King" is represented by the Unicode codepoint
'BLACK CHESS KING' (U+265A). Such characters might difficult to
communicate in some contexts (e.g., in screen readers for the blind),
but are expressive and fun, which is not an unimportant consideration
for many IM users -- even at the expense of reliability.
Does the desire for an expressive username or nickname trump the need
for human-readable identifiers? Given the wide implementation of
full-Unicode addresses in human-oriented XMPP applications, IM client
developers seem to think so.
Saint-Andre Expires April 25, 2011 [Page 4]
Internet-Draft XMPP I18N October 2010
These admittedly anecdotal and subjective considerations vaguely
indicate that the inclusion approach pursued in the IDNA2008
initiative is quite appropriate for the more restricted class of
domain names but perhaps not as appropriate for the localpart or
resourcepart of an XMPP address.
That being said, some XMPP implementations (e.g., a custom client) or
deployments (e.g., an IM system at a large enterprise or branch of
the military) might wish to "lock down" the expressive potential of
XMPP addresses by limiting provisioned addresses to a particular
subset or version of Unicode, by specifying which scripts, languages,
code points, and text directions are supported, etc. Currently there
is no way for an implementation or deployment to do so in
standardized manner that can be communicated to other entities on the
network (e.g., during account provisioning). Given that a deployed
XMPP service acts in some ways like a registrar does for domain
names, such methods might be helpful; although they are out of scope
for the XMPP WG, they might be considered by the XMPP Standards
Foundation (e.g., in revisions to or a replacement for [XEP-0077]).
3. String Classes
Both [PROBLEM] and [FRAMEWORK] propose that it might be valuable to
think of internationalized addresses in terms of broad "string
classes" such as domain name, email address, restricted identifier,
less-restrictive identifier, and perhaps even free-form identifier
(just about anything goes). Particular technologies like XMPP could
either borrow such a string class unchanged (as we do for domain
names) or adapt or "profile" such a string class with modifications
(e.g., as could possibly do by profiling the email address class,
restricted identifier class, or less-restrictive for localparts and a
possible free-form identifier class for resourceparts).
This document does not yet make recommendations about borrowing or
adapting more general string classes, in part because those classes
are not yet clearly defined. However, as input to further
discussion, this document explores the two string classes for which
[RFC3920] defined new stringprep profiles: localparts and
resourceparts. The following subsections refer to the properties
described in Section 3 of [PROBLEM] (input restrictions,
normalization, case mapping, and bidirectionality).
3.1. Localpart
The localpart of an XMPP address is joined with a domainpart (via the
'@' separator) in a way that on the surface looks like an email
address, such as <juliet@example.com>. However, there are some
Saint-Andre Expires April 25, 2011 [Page 5]
Internet-Draft XMPP I18N October 2010
subtle differences, even if we assume that the username portion of an
email address inherits from the "dot-atom-text" rule of [RFC5322]
instead of the more complex "local-part" rule. Specifically, within
the ASCII block:
o Several characters are allowed in email addresses but not allowed
in JabberIDs: "&" (U+0026), "'" (U+0027), and "/" (U+002F).
o Several characters are allowed in JabberIDs but not allowed in
email addresses: "(" (U+0028), ")" (U+0029), "," (U+002C), "."
(U+002E), ";" (U+003B), "[" (U+005B), "\" (U+005C), and "]"
(U+005D).
Those differences might not be significant enough to prevent the XMPP
WG from adapting or "profiling" an email address class if the PRECIS
WG produces such a class. On the other hand, they might lead the
XMPP WG toward borrowing or adapting either the restricted identifier
class or the less-restrictive identifier class, depending on how
those are defined.
With regard to input restrictions, the characters allowed in an XMPP
localpart have always been lightly restricted. Within the ASCII
block, the only restricted characters are space, controls, "
(U+0022), & (U+0026), ' (U+0027), / (U+002F), : (U+003A), < (U+003C),
> (U+003E), and @ (U+0040). Outside the ASCII block, no characters
are currently restricted. It is an open issue whether further
restrictions are desirable (e.g., do XMPP localparts really need to
include symbol characters such as hearts and stars?).
With regard to normalization, the Nodeprep profile of stringprep
specifies that implementations apply Unicode normalization form NFKC
(Compatibility Decomposition followed by Canonical Composition). As
briefly described in Section 2.4 of [PROBLEM], it is an open question
whether it is more appropriate to apply Unicode normalization form
NFKC, form NFC (Canonical Decomposition followed by Canonical
Composition), or no normalization at all. These forms are defined in
"Unicode Standard Annex #15: Unicode Normalization Forms" [UAX15],
along with several examples of the differing outputs they can
produce. As two examples, for the source code point U+FB01 (SMALL
LATIN LIGATURE FI) NFC produces that same code point whereas NFKC
produces "f" followed by "i" (U+0066 and U+0069), and for the source
code points U+0032 (DIGIT TWO) and U+2075 (SUPERSCRIPT FIVE) NFC
produces those same code points whereas NFKC produces U+0032 (DIGIT
TWO) and U+0035 (DIGIT FIVE). Very informally, XMPP developers can
think of NFKC as trying to be smart -- and perhaps sometimes too
smart.
With regard to case mapping, the Nodeprep profile of stringprep
specifies that XMPP localparts are case-folded, and we want to retain
Saint-Andre Expires April 25, 2011 [Page 6]
Internet-Draft XMPP I18N October 2010
that feature (e.g., we want <juliet@example.com> and
<Juliet@example.com> to identify the same entity on the network).
With regard to bidirectionality (i.e., scripts that are written right
to left), [RFC3920] did not provide any guidance other than pointing
to Section 6 of [RFC3454]. Any treatment of bidirectionality in XMPP
localparts is an open issue ([RFC5893] provides some helpful
discussion of the general topic, at least as applied to
internationalized domain names).
3.2. Resourcepart
The resourcepart of an XMPP address has traditionally been a kind of
"anything goes" string, even allowing the space character. If the
PRECIS WG defines something like a free-form identifier, the XMPP WG
might borrow or adapt that class. Another option would be to say
that the resourcepart is Net-Unicode as specified in [RFC5198].
With regard to input restrictions, the characters allowed in an XMPP
resourcepart have always been lightly restricted. Within the ASCII
block, the only restricted characters are controls. Outside the
ASCII block, no characters are currently restricted. Although is an
open issue whether further restrictions are desirable, as explained
under Section 2 XMPP-based IM systems have taken advantage of the
lack of restrictions on resource identifiers (e.g., in multi-user
chatrooms).
With regard to normalization, the Resourceprep profile of stringprep
specifies that implementations apply Unicode normalization form NFKC
(Compatibility Decomposition followed by Canonical Composition).
With regard to case mapping, the Resourceprep profile of stringprep
specifies that XMPP localparts are not case-folded (e.g., in an XMPP-
based chatroom, the participant "StPeter" could be different from the
participant "stpeter"). It is an open question whether this behavior
is necessary or desirable in all contexts.
With regard to bidirectionality (i.e., scripts that are written right
to left), [RFC3920] did not provide any guidance. Any treatment of
bidirectionality in XMPP resourceparts is an open issue ([RFC5893]
provides some helpful discussion of the general topic, at least as
applied to internationalized domain names).
4. Migration Issues
Any move away from Nameprep, Nodeprep, and Resourceprep as they are
defined today will inevitably introduce the potential for migration
Saint-Andre Expires April 25, 2011 [Page 7]
Internet-Draft XMPP I18N October 2010
issues, such as JIDs that were not ambiguous before the migration but
that become ambiguous after the migration. These issues need to be
clearly defined and well understood so that the costs and benefits of
any change can be properly assessed -- especially if the change might
have an impact on authentication (e.g., as described in [RFC3920]),
authorization (e.g., presence subscriptions as described in
[XMPP-IM]), access (e.g., joining a chatroom as described in
[XEP-0045]), identification (e.g., in XMPP URIs or IRIs as described
in [XMPP-URI]), and other security-related functions.
5. User Interface Issues
[RFC5895] introduces the helpful concept of "the dividing line
between user interface and protocol" and applies that concept to the
complexs process of translating the user's (presumed) intentions into
bits on the wire. IDNA2003 conflated user interface processing and
machine-readable protocols, and in many ways XMPP inherited that same
error. It would be desirable for XMPP technologies to define a clear
dividing line between user interface and protocol. This might mean
that the XMPP community will need to define recommended mappings that
are applied to a string before it is considered a JID (or the
localpart of resourcepart of a JID).
6. Recommendations
6.1. Possible Approaches
This document does not yet provide definitive recommendations, but
instead mainly seeks to foster discussion about internationalized
addresses in XMPP. However, there are three possible approaches that
the XMPP WG might pursue in relation to its existing stringprep
profiles:
1. Keep using Nameprep, Nodeprep, and/or Resourceprep as they are
defined today.
2. Collaborate with other interested parties or working groups to
define a new version of stringprep that tracks changes to Unicode
since Unicode 3.2 as currently specified in [RFC3454].
3. Pursue the general model followed in the IDNA2008 work by
defining a tiered model of valid, disallowed, and unassigned
characters; such an effort might be pursued only within the XMPP
community (for Nodeprep, Resourceprep, or both) or more generally
in concert with other users of stringprep.
Saint-Andre Expires April 25, 2011 [Page 8]
Internet-Draft XMPP I18N October 2010
The XMPP WG might even decide to use a mix of these approaches, e.g.
to use the new, non-stringprep IDNA2008 approach for domainparts but
the existing Nodeprep and Resourceprep profiles for localparts and
resourceparts.
In general, given that the PRECIS WG has been formed as a common
effort across different technologies, it is reasonable for the XMPP
developer community to participate in that WG (and for the XMPP WG to
cooperate with that WG) and to adopt whatever solutions are developed
in that WG.
6.2. Domainpart
RFC 3920 specifies the use of IDNA2003 for the domainpart of a JID
(which in the terms of IDNA2008 [RFC5890] is a "domain name slot").
This document does not question the reasoning behind the IDNA2008
work and therefore recommends the use of IDNA2008 technologies in the
document that obsoletes [XMPP-ADDR].
6.3. Localpart
This document does not yet provide a recommendation regarding the
localpart of a JID.
6.4. Resourcepart
This document does not yet provide a recommendation regarding the
resourcepart of a JID.
7. Security Considerations
The inclusion of non-ASCII characters in XMPP addresses has important
security implications, such as the ability to mimic characters or
entire addresses through the inclusion of "confusable characters"
(see [RFC4690] and [RFC5890]). These issues are explored at some
length in [XMPP-ADDR]. Other security considerations might apply and
will be described in a future version of this specification.
8. IANA Considerations
This document has no actions for the IANA.
9. Informative References
[FRAMEWORK]
Saint-Andre Expires April 25, 2011 [Page 9]
Internet-Draft XMPP I18N October 2010
Blanchet, M., "Precis Framework: Handling
Internationalized Strings in Protocols",
draft-blanchet-precis-framework-00 (work in progress),
July 2010.
[PROBLEM] Blanchet, M. and A. Sullivan, "Stringprep Revision Problem
Statement", draft-ietf-precis-problem-statement-00 (work
in progress), October 2010.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454,
December 2002.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003.
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)",
RFC 3491, March 2003.
[RFC3920] Saint-Andre, P., Ed., "Extensible Messaging and Presence
Protocol (XMPP): Core", RFC 3920, October 2004.
[RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
Recommendations for Internationalized Domain Names
(IDNs)", RFC 4690, September 2006.
[RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network
Interchange", RFC 5198, March 2008.
[RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
October 2008.
[RFC5890] Klensin, J., "Internationalized Domain Names for
Applications (IDNA): Definitions and Document Framework",
RFC 5890, August 2010.
[RFC5891] Klensin, J., "Internationalized Domain Names in
Applications (IDNA): Protocol", RFC 5891, August 2010.
[RFC5892] Faltstrom, P., "The Unicode Code Points and
Internationalized Domain Names for Applications (IDNA)",
RFC 5892, August 2010.
[RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
Internationalized Domain Names for Applications (IDNA)",
RFC 5893, August 2010.
Saint-Andre Expires April 25, 2011 [Page 10]
Internet-Draft XMPP I18N October 2010
[RFC5894] Klensin, J., "Internationalized Domain Names for
Applications (IDNA): Background, Explanation, and
Rationale", RFC 5894, August 2010.
[RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for
Internationalized Domain Names in Applications (IDNA)
2008", RFC 5895, September 2010.
[UAX15] The Unicode Consortium, "Unicode Standard Annex #15:
Unicode Normalization Forms", September 2010.
[XEP-0029]
Kaes, C., "Definition of Jabber Identifiers (JIDs)", XSF
XEP 0029, October 2003.
[XEP-0077]
Saint-Andre, P., "In-Band Registration", XSF XEP 0077,
September 2009.
[XEP-0045]
Saint-Andre, P., "Multi-User Chat", XSF XEP 0045,
July 2008.
[XMPP-ADDR]
Saint-Andre, P., "Extensible Messaging and Presence
Protocol (XMPP): Address Format",
draft-ietf-xmpp-address-05 (work in progress),
October 2010.
[XMPP-IM] Saint-Andre, P., "Extensible Messaging and Presence
Protocol (XMPP): Instant Messaging and Presence",
draft-ietf-xmpp-3921bis-15 (work in progress),
October 2010.
[XMPP-URI]
Saint-Andre, P., "Internationalized Resource Identifiers
(IRIs) and Uniform Resource Identifiers (URIs) for the
Extensible Messaging and Presence Protocol (XMPP)",
RFC 5122, February 2008.
[XMPP] Saint-Andre, P., "Extensible Messaging and Presence
Protocol (XMPP): Core", draft-ietf-xmpp-3920bis-17 (work
in progress), October 2010.
Saint-Andre Expires April 25, 2011 [Page 11]
Internet-Draft XMPP I18N October 2010
Author's Address
Peter Saint-Andre
Cisco
1899 Wyknoop Street, Suite 600
Denver, CO 80202
USA
Phone: +1-303-308-3282
Email: psaintan@cisco.com
Saint-Andre Expires April 25, 2011 [Page 12]
| PAFTECH AB 2003-2026 | 2026-04-24 01:16:28 |