One document matched: draft-klensin-ima-constraints-00.txt
Network Working Group J. Klensin
Internet-Draft February 26, 2006
Expires: August 30, 2006
Internationalization in Internet Applications: Issues, Tradeoffs, and
Email Addresses
draft-klensin-ima-constraints-00.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 30, 2006.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
The discussions of internationalized email addresses in the IETF have
led to a number of stated requirements. This document identifies
some of those requirements in the context of general issues of
internationalization of Internet name spaces, demonstrates that the
combination of all of the requirements that appear reasonable on
first glance adds up to a null solution space, and then suggests a
different model for proceeding.
Klensin Expires August 30, 2006 [Page 1]
Internet-Draft I18N Email Constraints February 2006
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Environment for Internationalization and Fragmentation
Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Climate for Internationalization: The DNS History . . . . 5
2.2. Technology . . . . . . . . . . . . . . . . . . . . . . . . 7
3. Consequences and Implications . . . . . . . . . . . . . . . . 8
3.1. Choosing and mixing scripts and languages . . . . . . . . 9
3.2. Confusable characters and communcations accuracy . . . . . 10
3.3. Communication across languages and cultures . . . . . . . 10
3.4. The place of internationalization in a global Internet . . 11
4. Specific Impact of I18N Email Addressing . . . . . . . . . . . 12
5. Security Considerations . . . . . . . . . . . . . . . . . . . 13
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.1. Normative References . . . . . . . . . . . . . . . . . . . 13
7.2. Informative References . . . . . . . . . . . . . . . . . . 14
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16
Intellectual Property and Copyright Statements . . . . . . . . . . 17
Klensin Expires August 30, 2006 [Page 2]
Internet-Draft I18N Email Constraints February 2006
1. Introduction
In general, internationalization has been approached in the IETF on
the assumption that, if one can get the character sets and perhaps
language tags right, other issues will take care of themselves. An
"internationalization considerations" section is strongly suggested
for RFCs (see RFC 2277, Section 6 [RFC2277] and note that Section 3.1
of that document requires UTF-8 support of all protocols, hence all
protocols hence all protocol documents "deal with
internationalization issues at all"), but there are no real
guidelines about what should be in it and the requirement has not
always been enforced. There are also some additional requirements,
e.g., for UTF-8 support [RFC2277]. Particular protocols have gone
beyond these guidelines. In particular, the standards for
internationalized domain names, IDNA [RFC3490], use Unicode as a base
but utilize their own encoding of Unicode, punycode [RFC3492]. Those
standards carefully avoid identification of languages, since domain
names inherently consist of more or less arbitrary strings, not
"words" or other language elements.
That body of work generally ignores an important observation and its
consequences. When user-chosen words, names, and non-ASCII scripts
are used at the applications layer, users will often treat them as
language elements having meaning, and often pronunciations, in those
languages, not merely as strings of characters. The assumptions of
meaning or pronunciation, in turn, will often introduce age-old
problems of cross-language reading and understanding into the design
of applications, or applications protocols, that are intended to work
globally: if one person cannot read or understand the language of
another, the fact imposes limitations on communication that, in
general, cannot be solved by protocol design. In the most extreme
cases, differences in the languages and character sets that people
find normal and convenient impose practical limits on
interoperability: choices must be made between compatibility and
convenience within a linguistic and cultural community and global
interoperability that will, inevitably, be less convenient for some
groups and cultures than others. In some cases, solutions are
feasible that make things convenient within a cultural or linguistic
group and provide a less-convenient mechanism for getting between
groups, in others, even more difficult choices will need to be made.
And, in some cases (fortunately a gradually declining number), the
realities of character codings, presentation, and operating systems
make obvious solutions to problems impractical.
While these issues have appeared in the context of internationalized
domain names and in other applications, recent work to permit non-
ASCII local parts of electronic mail addresses without violating the
constraints of the mail protocols themselves have brought several of
Klensin Expires August 30, 2006 [Page 3]
Internet-Draft I18N Email Constraints February 2006
the issues into better focus. This document discusses some of the
issues and problems -- both technical and in terms of user
expectations -- in general form and then reviews some of the
implications for email and other protocols that impose their own
constraints on strings and their interpretation.
While changes in lower-level Internet protocols and interfaces must
almost always occur at the protocol level (i.e., be visible "on the
wire" -- see below), there are at least three choices for
internationalization at the applications layer. Picking the right
one requires some understanding of how the features will be used, the
degree to which localization will be appropriately overlaid on the
basic internationalization features, and some general wisdom about
design. The option that is obvious at first is not necessarily the
best choice. The options are:
o Protocol changes, i.e., features that appear "on the wire" in the
interactions between client and server or between peer hosts. The
internationalization provisions for MIME body parts [RFC1341] are
examples of protocol-level mechanisms, since they appear in the
client-server interactions.
o Client-side changes, i.e., features that have characteristics
similar to protocol ones, but that are implemented entirely on the
client, without "on the wire" visibility. Domain name
internationalization using the IDNA specification [RFC3490] is an
example of a strictly client-side mechanism since non-ASCII
characters do not appear on the wire and the DNS server is not
required to be aware that internationalized names are being used.
o Adding a new layer or new abstraction, i.e., accomplishing
internationalization or localization not by somehow
internationalizing an existing protocol or introducing a
replacement protocol, but by adding new facilities that rest on
top of an unmodified non-internationalized protocol. Localization
facilities might also be added as a new layer on top of an
internationalized lower layer. Various efforts to add "keywords"
or other "above DNS search" mechanisms, the standardization of a
internationalized version of the URI [RFC3986] as an IRI
[RFC3987], and similar arrangements are "new layer" approaches.
2. Environment for Internationalization and Fragmentation Risks
In looking at the combination of efforts to internationalize the
Internet, especially at the protocol level, we encounter two large
groups of issues. One has to do with the social, cultural, and
political climate associated with the making of any decision about
internationalization in recent years and the other is about the
technology. The subsections that follow address both since, in
Klensin Expires August 30, 2006 [Page 4]
Internet-Draft I18N Email Constraints February 2006
practice, it is impossible to deal with them separately. In
particular, as this document illustrates, if one examines the
technical issues, the desire to avoid constraints on global end to
end communications, and to minimize the risks of incorrect
identification of destination hosts or users, the conclusion would be
likely to be that almost any internationalization at the protocol
level is a bad idea. On the other hand, if the social and cultural
context is examined, it becomes clear that avoiding any
internationalization at the protocol level will lead to a different
type of fragmentation and, if that context is examined alone, demands
will arise for protocol changes that are not plausible in practice.
2.1. Climate for Internationalization: The DNS History
The biggest potential for network fragmentation due to introduction
of mutually-incomprehensible scripts occurred with the development of
domain names that are not intended to be presented as ASCII strings.
There was considerable resistance in the technical community to that
set of decisions based on the belief that domain names were
ultimately protocol elements that should remain, at least for
application purposes, in a restricted subset of ASCII (a subset that
is compatible with ISO 646 BV [ISO.646.1991]). At least part of that
community also concluded that internationalization should occur in a
protocol layer closer to the user, i.e., "above the DNS" [RFC3467].
This layer might be thought of as the "presentation layer" of the
classical OSI model although the analogy is not exact. Those who
resisted DNS changes suggested that it might make sense to
distinguish what actions were taken in the DNS from a presentation
layer in which some new name spaces or resource identifiers might
occur. In that context, URIs [RFC3986], with their potentially
elaborate syntax, are no one's idea of "user friendly" even if one
ignores the desire for non-ASCII scripts entirely. The
internationalized form, IRIs [RFC3987] solve part of the non-ASCII
script problem, but are really no better: they permit
internationalization of the strings that make up URIs, but do not
address the complexity of the syntax or the ASCII syntax elements.
Such a presentation layer could make more culturally-reasonable forms
visible to the user while preserving clear layering over the
fundamental URI types and domain names that would remain unchanged.
That model would provide at least the potential for good localization
while preserving a common script, syntax, and set of conventions for
dealing with the actual elements of the network.
Although the idea of layering internationalization on top of an ASCII
protocol substrate seems to come back each time an application issue
is examined carefully, it has not gained significant traction in
practice other than as, e.g., DNS alternatives. Hence, the argument
has been lost, several times and in several different ways. It
Klensin Expires August 30, 2006 [Page 5]
Internet-Draft I18N Email Constraints February 2006
became clear that, if the IETF had not provided some rational and
standardized ways to represent internationalized (non-ASCII) domain
names, we would have ended up with chaos -- different coded character
sets in different zones with some of them probably treated as binary
labels. We would see some shift-JIS form in Japan, GB forms in
China, ISO 8859-1 in Western Europe and other ISO 8859 variations in
some other areas, and unpredictable other variations in the rest of
the world. Worse, the only way to determine which particular coded
character set (CCS) was being used would be out of band knowledge,
since none of the people promoting those approaches came forward with
any realistic plans for how to label "charsets" (essentially a
combination of a script and a coding system for those who have not
followed the MIME version of that discussion; see [RFC2978] and
[RFC2277] for more precise definitions, further discussion, and
references) in the DNS. Indeed, in spite of the standard, we have
already seen the beginnings of fragmenting developments in some
domains along with special "improved, enhanced, and
internationalized" (and not quite interoperable) DNS servers being
offered by some companies.
So, despite some misgivings, the IETF defined IDNs via IDNA [RFC3490]
(including exclusive use of Unicode as the defining character set).
From the standpoint of this discussion, the interesting thing about
IDNA is that it doesn't change the DNS at all. It is a strictly
client-side protocol, with Unicode strings being pushed through a
canonicalization process and then transformed into an "ASCII-
compatible" form (called "punycode") that, to the DNS and
applications that have not been upgraded, looks like (and is)
hostname-format names, i.e., ASCII letters, digits and hyphens. It
was done that way because of a belief that the coding system would
lead to very rapid deployment without any negative impact on systems
or applications that had not been upgraded. Its most passionate
advocates were convinced that, once there was wide deployment, no one
would ever see the internal coding.
From the standpoint of global interoperability, the good news is that
they were wrong -- we have some other problems to cope with, but one
of them is not "you can't get there because you can't read or type
the string". If the application permits you to get to it, you can
always access and type the punycode string rather than whatever might
show up in characters you can't read, can't type, and maybe can't
even render. Of course, this requires that all applications support
entry of Roman characters, even if such entry is not convenient.
The choice of Unicode was, however, very important, not because it is
wonderful as a character set, but because it avoids the issues of
identifying what CCS is being used and, the WG hoped, of picking
which characters would be valid and which ones would not be.
Klensin Expires August 30, 2006 [Page 6]
Internet-Draft I18N Email Constraints February 2006
Avoiding determining which characters should be valid and which ones
should not has also been less successful than one might have hoped;
both the IAB (see [IDN-Nextsteps]) and the Unicode Consortium (see
[UTR36] and [UTR39]) are struggling with approaches to that problem
for which they did not foresee a need when IDNA was adopted.
But, ultimately, it is important to remember as we talk about any of
this that the choice was never between "figure out some way to
internationalize the DNS" and "don't do it because it was a bad
idea". The choice was only between whether we did it on in a global,
standard, way that was fairly safe as far as DNS operations were
concerned or whether we ended up with a collection of different
mechanisms that would not interoperate cleanly and unambiguously
within a single domain name system.
2.2. Technology
As the result of these factors and tensions, IDNA became a completely
client-side IDN protocol. Several of the worst fears of the
pessimists have come true: we have confusion over look-alike
characters, we have the potential to receive and see characters we
can't read or type, the Unicode Consortium's beliefs about how widely
Unicode is available and about smooth conversions between codings
are, at best, very controversial, some implementers have "improved"
on the standard tables, and so on. Email MIME textual body parts
should be safe against character set problems due to the presence of
the "charset" parameter. However, in practice, problems in which one
character is mapped into an entirely different one are fairly
routine, most notably as the result of forwarding or otherwise
including all or part of one message in a body part that is
constructed locally according to different character set conventions.
Copying of text that was developed in one character coding context
and pasting it into another is not completely reliable for related
reasons. These problems are symptomatic of those we will certainly
encounter in the future as the Internet becomes increasingly
international and multilingual. Probably the worst is yet to come.
As was the case with the pre-MIME internationalized mail body
approaches and with the development of IDNA, the local solutions
--the ones that are not interoperable globally-- will work, and work
well, within the relevant cultural and linguistic communities.
Realistically, the IETF cannot ignore the issues and problems and
either hope they will go away or decide to do nothing because the
problems will cause disruption. To do so is to guarantee that local
solutions will be developed and that that people who use them will be
unable to communicate internationally (at least with the same tools
they use locally) and that people outside their communities will be
unable to communicate with them.
Klensin Expires August 30, 2006 [Page 7]
Internet-Draft I18N Email Constraints February 2006
The key question is what the difficulties with the global solutions
or the development of local solutions actually do to
interoperability. The Internet community is probably in for a bad
time as reality catches up with many fantasies and delusions about
how systems and people work, but there is some reason for optimism
about the long term. To take one (admittedly-extreme) reality as an
example, suppose one user's primary language were written only in Old
Futhark Runic and that user does not read or speak any other
languages or write any other script. Assume further, stretching the
imagination a bit, that the only keyboards available to that user
have only runes on them. That user would have some serious problems
in communications. In particular, she would have been dead for
centuries: as far as is known, no living person really knows how
those languages and scripts worked (although there is a lot of
speculation) and it is unclear whether some of the Unicode decisions
in coding the runes are actually correct, much less optimal. She is
also not on the Internet in any significant way: the hypothetical
keyboard does not exist, there is no way to type a URL or email
address on it, etc. So, for that user, the net effect of permitting
IDNs in Runic, which IDNA now permits, is going to be just about zero
except maybe in terms of helping with her cultural pride. More
important, if she can find a few other living exclusive users of the
relevant scripts and languages, her ability to use those scripts and
languages in either content or domain names _might_ enhance their
ability to communicate with each other, but they certainly are not
going to increase or decrease anyone else's ability to communicate
with any of them.
On the other hand, suppose a different user can speak, read, and
write Russian as well as Old Viking Runic, but nothing else. If he
wants to communicate on the Internet, he can send notes (and use
domain names, etc.) that some reasonably large number of people will
be able to read easily, and a larger number will be able to get
through with a struggle, but, for anyone who does not read Russian or
recognize Cyrillic characters, he might as well have used Runic --
the symbols are useless either way. This problem is, of course,
centuries old. IDNs don't make it any worse although they don't help
either.
While Runic is a far-fetched example, some of the African languages
and scripts are not. And, unlike Runic, some of those African
scripts have not even been coded into Unicode yet.
3. Consequences and Implications
The Internet community is probably in for a nasty learning curve, but
things should work out as people accept reality. Within a language
Klensin Expires August 30, 2006 [Page 8]
Internet-Draft I18N Email Constraints February 2006
and cultural community, IDNs --and, even more important, email
addresses with non-ASCII characters in the local parts-- are almost
certain to be very important, especially among groups of people who
are not comfortable with Roman-based characters. They are going to
prove helpful just as the ability to use native/local characters in
content has proven helpful. That helpfulness is going to be
important to spreading accessibility to the Internet into some
population groups (although, until there is a great deal of content
in their languages, probably not as much as some of the IDN advocates
around WSIS and ICANN have believed). But, for communication between
different language and cultural groups, we are going to find that we
need to do what people have done through history, even before
computer networking entered the equation: we will have to figure out,
probably out of band, what languages and scripts we share with
particular correspondents and then pick a member of that set.
3.1. Choosing and mixing scripts and languages
The choice of a common and shared script or language is going to be
far more complicated for many cases than any of our existing content-
negotiation ideas anticipate. We will need to remember that some
people may be able understand a spoken language but not read it in
some or all of the scripts in which it is normally written and that,
especially for alphabetic scripts, the ability to read the script
(and even to crudely pronounce the sounds it implies) does not imply
the ability to understand any of the languages normally written in
it. These differences may relate to the ability to recognize
characters in a table, use a keyboard, recognize characters that
might appear in an IRI or email address, and so on. Ugly and nasty
as punycode may be, we will need to pass domain names around in it
unless we know in advance that our readers will know the relevant
scripts well and be able to type them, cut and paste them accurately,
and so on. If we choose to use non-ASCII email local parts, we will
discover that we need to keep ASCII alternative aliases around for
communicating more broadly and that those ASCII alternatives will
not, in the general case, be derivable algorithmically. Once we get
the email internationalization situation under control, nothing
should prevent a speaker of Norwegian, say Torbjorn Torbjornson (with
slashes across the second "o" in each name), from having an email
address of torbjorn@example.com (U+00F8 as the sixth character, i.e.,
with a slash across the "o") but, if he and a Russian-speaker want to
communicate with each other, he would be well-advised to retain the
ability to receive mail at torbjorn@example.com (or some other
address), especially if the software of the Russian reader is going
to magically transform the U+00F8 character into "j", which would be
predicted by getting ISO 8859-1 and ISO 8859-5 confused. And, if his
alternative is not torbjorn@example.com but
torbjorn@torbjorn.example.com (with a slash over the sixth character
Klensin Expires August 30, 2006 [Page 9]
Internet-Draft I18N Email Constraints February 2006
in the domain name), then the Russian users or their software must be
able to generate and use torbjorn@xn--torbjrn-u1a.example.com
instead.
It may be useful to note that "have an alternate address available
and let people know" bears a strong resemblance to the traditional
two-sided Asian business cards. The Chinese, Korean, or Japanese
characters on the front may be the correct ones but, if the owner of
the card wants to have communications with illiterate westerners, the
Roman characters on the back will rapidly become very important. Of
course, many people in those populations make exactly that choice:
their business cards do not have Roman characters on them.
Consequently, they have no expectations of communication with people
who do not read and speak the relevant languages.
3.2. Confusable characters and communcations accuracy
The common example of similarity between the printed form of a
Cyrillic "A" and a Roman one raises issues similar to the Norwegian
example above. If one sees the character in a domain name in context
with other Cyrillic (or Roman) characters, it will probably lead to
the right guess unless someone is being deliberately deceptive or
cute. If the context is not available, a good guess might still be
possible based on whether the character appears on a sign in a rural
community in Russia or the US (in Moscow or New York, one would
probably need to know about specific neighborhoods and the guess
would be less reliable). Reducing the odds of a deception based on
confusion between the characters that some would consider similar in
appearance is a topic of active discussion, mostly about what DNS
registries should be permitted to register. But, if the person
writing that message out is really concerned about accuracy, then
either some explicit hints or, for domain names the punycode string,
had best appear on the business card or sign... if they do not, the
negative reinforcement from confused and irritated users will
gradually get the message across that they should.
3.3. Communication across languages and cultures
All of this implies that those who communicate across language and
cultural groups will be required to learn, if they do not understand
already, to be quite self-aware about the use of internationalized
identifiers, as well as other examples of characters or languages,
across those boundaries. There will be a lower level of demands on
those who communicate only in a single language and within a single
culture. This is, of course, not an issue that originated with the
introduction of the Internet: it has been this way since languages
and scripts started to differentiate from each other and since
different cultures came into contact. As we internationalize the
Klensin Expires August 30, 2006 [Page 10]
Internet-Draft I18N Email Constraints February 2006
network, a user of a given language that cannot be fully expressed in
ASCII will always be faced with a choice between insisting on the
purism of an email address local part and domain name in the script
associated with the local language and maximizing the number of
people who can communicate with her conveniently. In some cases, the
right answer will be "local language", in others, it will be "ASCII",
and in still others it will be "maintain two addresses". We are not
required, and should not try, to make that choice for users: the
users should make the best choices for their own needs, preferably
after understanding the consequences of the choices. As a community,
we will need to be very clever about user interfaces. As an example
much more general than email, if someone with no ability to read
Chinese characters sees a domain name written in those characters and
decides she wants to copy and paste it somewhere, the copy mechanism
is probably going to need to provide for both "copy the Chinese" and
"convert quietly to punycode and copy that". Either choice, by
itself, will be wrong sometimes. Users who both want to use Chinese-
script domain names and communicate outside that language or script
or culture are going to either learn to understand the difference and
relationship, or develop some good rituals that work, or the network
will keep slapping them in the head with failed lookups or bounced
mail until they do learn. Of course, substantially any language or
script could be substituted for "Chinese" in that example.
3.4. The place of internationalization in a global Internet
Does that make internationalized domain names a bad idea and
internationalized email addresses an even worse idea? Globally,
maybe... perhaps even probably if our exclusive focus is on global
uses of the Internet. But that is where we get back to examples
similar to the Runic one. If we have a population in an Arabic-
speaking country that only reads and writes in Arabic and only wants
to communicate with each other, internationalization extensions let
them get themselves onto the Internet and communicate with each other
and to do so without causing any harm to the rest of the Internet.
It appears that is A Good Thing or at least not harmful in any
significant way. Will it help them communicate with someone who
cannot read Arabic or help that person communicate with them? Not a
bit, at least in the absence of a translator who competent in Arabic
and has the right computer tools. The alternative, stated in its
most extreme form, is "everyone who really wants to be an effective
user of the global Internet had better be able to function in
English". At one level, that is probably true, politically-incorrect
though it may be. But, at another, it is a very different statement
than requiring that everyone who wants to communicate in Amharic,
with other Amharic-speakers, be forced to translate to and from
English (or at least to and from a subset of ASCII characters) to
manage that communication rather than being able to use their own
Klensin Expires August 30, 2006 [Page 11]
Internet-Draft I18N Email Constraints February 2006
language and (Ethiopic) script.
We need to be very careful to not make interoperability (or
reliability of references and the like) worse among those who can now
communicate. It does not appear that either IDNs or i18n email
addresses will necessarily make things worse, but we should remain
vigilant to be sure that doesn't change. Until everyone learns good
habits we may rediscover an important part of the X.400 model-in-
practice: sooner or later, a non-speaker of Chinese will get a
message from a Chinese colleague with a return address that is all-
Chinese. The recipient will have no hope of using it in a reply
unless cut and paste works, and will not be able to reliably verify
whether or not it worked. That user (message recipient) will have to
deal with the message and replying to it by selecting an out-of-band
communications path --a different address or the telephone are the
most likely-- to get in touch with that person and either deliver the
reply over that path or use it to say "I just got something from you,
if in fact it was you, and I have no possible way to reply to it as
written. So what other address or path would you like me to use?"
Clearly, that would not be ideal. But there is no ideal solution as
long as people persist in speaking different languages and writing in
different scripts. It does not appear that the use of different
languages and scripts is likely to stop any time soon and, in
general, it is not desirable that it do so.
4. Specific Impact of I18N Email Addressing
As discussed in [I18Nemail-Framework], the requirement that nothing
inspect or alter an email local-part other than the final delivery
server (see [RFC2821]) imposes strong constraints on automatic
transformations of internationalized email addresses to ASCII form.
If we insist on reliable cutting and pasting, regardless of the
operational character coding of mail user agents, we are probably
constrained to avoid non-ASCII forms entirely: only putting the
internationalized string in encoded words and leaving the address
exclusively in ASCII will work in a large number of cases, but even
that can fail occasionally. So, if we try to impose a rule in which
the only email addresses that are permitted are those that will
always be usable globally, the consequence will be a conclusion that
non-ASCII local parts are impossible.
Unfortunately, that conclusion is a recipe for local, non-
interoperable, solutions -- probably ones based on "just use our
local characters and character coding" -- and the consequent de facto
network fragmentation that would follow from it, as discussed above.
A better approach is adopt a more realistic set of goals, starting
Klensin Expires August 30, 2006 [Page 12]
Internet-Draft I18N Email Constraints February 2006
from the realization that people who have no need or desire to
communicate outside their language or cultural group are not going to
do so and then focusing on (i) permitting them to communicate as they
wish without creating risks for other Internet users and (ii)
providing reasonable facilities for those who do wish to communicate
across language groups to do so.
5. Security Considerations
This document discusses a series of internationalization issues that
bear on interoperability and might indirectly bear on security. As
such, it may suggest some issues that should be considered in
security evaluations of internationalized protocols. Its conclusions
also reinforce the well-understood point that expanding the range of
characters in which identifiers can be expressed will tend to
complicate the design of security-related protocols, and user
interfaces to them, that utilize such internationalized identifiers.
However, it raises no new security issues in itself.
6. Acknowledgements
The author would like to thank Alex Zinin and Dmitry Burkov for
initiating a conversation about the relationship between Internet
internationalization and fragmentation. That conversation ultimately
led to this memo. ...More to be supplied...
7. References
7.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels'", RFC 2119, March 1997.
[RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
April 2001.
[RFC2978] Freed, N. and J. Postel, "IANA Charset Registration
Procedures", BCP 19, RFC 2978, October 2000.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003.
[RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
for Internationalized Domain Names in Applications
Klensin Expires August 30, 2006 [Page 13]
Internet-Draft I18N Email Constraints February 2006
(IDNA)", RFC 3492, March 2003.
7.2. Informative References
[I18Nemail-Framework]
Klensin, J. and Y. Ko, "Overview and Framework for
Internationalized Email",
draft-klensin-ima-framework-00.txt (work in progress),
September 2005, <http://www.ietf.org/internet-drafts/
draft-klensin-ima-framework-00.txt>.
[IDN-Nextsteps]
Klensin, J. and P. Faltstrom, "Review and Recommendations
for Internationalized Domain Names (IDN)",
draft-iab-idn-nextsteps-03.txt (work in progress),
February 2006, <http://www.ietf.org/internet-drafts/
draft-iab-idn-nextsteps-03.txt>.
[ISO.646.1991]
International Organization for Standardization,
"Information technology - ISO 7-bit coded character set
for information interchange", ISO Standard 646, 1991.
[Klensin-emailaddr]
Klensin, J., "Internationalization of Email Addresses",
draft-klensin-emailaddr-i18n-03 (work in progress),
July 2005.
[RFC1341] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet
Mail Extensions): Mechanisms for Specifying and Describing
the Format of Internet Message Bodies", RFC 1341,
June 1992.
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, January 1998.
[RFC3467] Klensin, J., "Role of the Domain Name System (DNS)",
RFC 3467, February 2003.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRIs)", RFC 3987, January 2005.
[UTR36] Davis, M. and M. Suignard, "Unicode Technical Report #36:
Unicode Security Considerations", November 2005,
Klensin Expires August 30, 2006 [Page 14]
Internet-Draft I18N Email Constraints February 2006
<http://www.unicode.org/draft/reports/tr36/tr36.html>.
Working Draft for Proposed Update
[UTR39] Davis, M. and M. Suignard, "Unicode Technical Standard #39
(proposed): Unicode Security Considerations", July 2005,
<http://www.unicode.org/draft/reports/tr39/tr39.html>.
Working Draft for Proposed Draft
Klensin Expires August 30, 2006 [Page 15]
Internet-Draft I18N Email Constraints February 2006
Author's Address
John C Klensin
1770 Massachusetts Ave, #322
Cambridge, MA 02140
USA
Phone: +1 617 491 5735
Email: john-ietf@jck.com
Klensin Expires August 30, 2006 [Page 16]
Internet-Draft I18N Email Constraints February 2006
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Klensin Expires August 30, 2006 [Page 17]
| PAFTECH AB 2003-2026 | 2026-04-24 07:22:56 |