One document matched: draft-crocker-idn-idna-00.txt
Internet Draft D. Crocker
draft-crocker-idn-idna-00.txt Brandenburg InternetWorking
Expires in six months 23 June 2002
Internationalizing Domain Names
in Applications (IDNA)
Status of this Memo
This document is an Internet-Draft and is in full
conformance with all provisions of Section 10 of
RFC2026.
Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its
working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum
of six months and may be updated, replaced, or obsoleted
by other documents at any time. It is inappropriate to
use Internet-Drafts as reference material or to cite
them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be
accessed at http://www.ietf.org/shadow.html.
Abstract
Internationalized Domain Names (IDN) use Unicode for
domain name, rather than using a subset of ASCII. This
increased name space, as well the requirement to
maintain compatibility with the existing domain name
service means that IDNs must be encoded in a form that
can be supported without changes to any portion of the
DNS that does not participate in the upgrade to IDN.
This specification defines a mechanism called IDNA for
handling them in a standard fashion and specifies an
IDNA profile for domain names used as host references.
IDNA allows non-ASCII characters to be represented using
the same octets used in so-called host names today. This
representation allows IDNs to be introduced with minimal
changes to the existing DNS infrastructure. IDNA is only
meant for processing domain names, not free text.
0. Document Change Notes --
This is a revision to draft-ietf-idn-idna-09.txt. It is
being distributed independently to facilitate
discussion.
The goal is to gain consensus about revisions to the IDN
working group document, specifically for the following
changes:
a. Split the document into two, one for defining
Internationalized Domain Names (IDN) and the other for
defining an encoding method of IDNs, namely IDNA using ACE.
b. Distinguish general IDNA from its specific use for host
names (IDNA-Host), by factoring the two into separate
specification sections. Use for host names is specified more
precisely, in terms of a specific syntax BNF rule from the
relevant existing DNS specification, so that IDNA-Host will
apply precisely to all DNS record fields and protocol units
conforming to that BNF.
c. Distinguish Domain Name character set enhancement (IDN)
from the encoding approach for 'non-native' representations
(IDNA).
d. Further clarification of distinction between IDNA world
and non-IDNA world.
e. Remove historical commentary. At the least, it needs to
be outside of the sections with normative text, or otherwise
distinguish as being non-normative.
f. Except for specification of the ACE-based mechanism,
move software, API and other host-specific discussion into a
non-normative appendix, so that the specification is
restricted to protocol-only details. f. Distinguish user
presentation from protocol and storage encoding.
g. Change the anthropomorphic, ambiguous use of 'aware' and
'unaware' to refer to the nature of encoding as IDN-native
and IDN-ACE.
Notations:
Text, such as citations, that needs to be provided is
indicated by <<???>>. Personal comments are indicated
by << // xqqy // /Dave >>
The changes are extensive, so that providing change
marks would be more distracting than helpful. Still,
most of the changes are slight language modifications
and some moving of text around. Most of the original
text is still present.
1. Introduction
Expansion of the DNS namespace to permit Unicode, rather
than a subset of ASCII, requires special handling of the
binary data, within an ASCII DNS environment. This
document proceeds from <id: draft-idn-idn-00.txt> and
defines:
1) A mechanism called IDNA for handling them in a
standard fashion within the current, ASCII-based
DNS, using an ASCII-compatible encoding (ACE) of
the IDN string
2) An IDNA profile, called IDNA-Host for domain
names used as references host references, such
as for URLs and email addresses.
IDNA allows applications to use ASCII name labels that
begin with a special prefix, to represent non-ASCII name
labels. Protocols that transport domain names need not
support this mapping; therefore IDNA does not require
changes to any protocol infrastructure. Equally, IDNA is
transparent to DNS servers and resolvers that do not yet
participate in the IDNA enhancement; the ASCII name
service provided by the existing DNS is sufficient for
handling IDNA ACE strings.
Therefore, the IDNA service also does not require any
applications to conform to IDNA, except applications
that elect to use IDNA in order to support IDN, while
maintaining interoperability with the existing, ASCII-
based DNS infrastructure. Adding IDNA support to an
existing application entails changes to the application
only -- or to a "shim" layer below the application and
above the existing transport and DNS protocol layers.
2. Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD",
"RECOMMENDED", and "MAY" in this document are to be
interpreted as described in RFC 2119 [RFC2119].
ACE
means ASCII Compatible Encoding.
ACE label
refers to an internationalized label that can be
represented using only ASCII characters but is
equivalent to a label containing non-ASCII
characters. More rigorously, an ACE label is
defined to be any label that the ToUnicode
operation would alter. For every internationalized
label that cannot be directly represented in ASCII,
there is an equivalent ACE label. An ACE label
always begins with the ACE prefix defined in
section 5. The conversion of labels to and from the
ACE form is specified in section 4.
ACE prefix
is defined to be a string of ASCII characters that
appears at the beginning of every ACE label. It is
specified in section 5.
Equivalence of labels
is defined in IDNA in terms of the ToASCII
operation, which constructs an ASCII form for a
given label. Labels are defined to be equivalent
if and only if their ASCII forms produced by
ToASCII match using a case-insensitive ASCII
comparison. Traditional ASCII labels already have a
notion of equivalence: upper case and lower case
are considered equivalent. The IDNA notion of
equivalence is an extension of the old notion.
Equivalent labels in IDNA are treated as alternate
forms of the same label, just as "foo" and "Foo"
are treated as alternate forms of the same label.
Internationalized domain name for applications (IDNA)
refers to a domain name subject to the technical
enhancements for supporting IDN in the case of
general domain names. Procedurally, this is a
domain name that can be mapped from IDN-native to
IDN-ACE with the ToASCII operation (see section 4)
applied to each label without failing.
IDN-ACE
is a domain name slot that is not an IDN-native
domain name slot. Obviously, this includes any
domain name slot whose specification predates IDNA.
3. IDN for Applications (IDNA)
In order to permit IDN functionality without requiring
changes to existing DNS infrastructure servers and
resolvers, IDNA uses ASCII Compatible Encoding (ACE).
IDNA-ACE represents IDN labels within the current, ASCII-
based DNS protocol and storage infrastructure. That is,
Domain names that use Unicode values in their labels are
encoded to occupy a reserved portion of the existing,
ASCII-based domain name space.
Components of an IDNA-enhanced DNS are:
Resolver-ASCII-1--|
|
|--Server-A--|
| |--Server-A-ASCII-admin
Resolver-ACE-2 ---|
|
|--Server-B--|
|--Server-B-ACE-admin
The components labeled with ASCII do not support IDN.
The components labeled with ACE support IDN through
IDNA's ACE conventions.
The protocol between any resolver and any server is
unmodified.
The software and procedures for administering Server-A
are unmodified. Server-A therefore maintains only slots
with original, ASCII values. It maintains no IDN slots.
Server-B is unmodified. However Server-B-ACE-admin is
modified to support creation and modification of IDN
slots, based on IDNA's ACE conventions. Hence, Server-B
can hold IDN labels.
Resolver-ASCII-1 is unmodified and supports only ASCII
domain names. It therefore can process an IDN string
only in its ACE form.
Resolver-ACE-2 is modified to support the IDN through
IDNA's ACE conventions. Hence it can convert ACE
strings to their "native" Unicode, for display according
to local host Unicode mechanisms. The modification to
Resolver-ACE-2 may be changes to the resolver, itself,
or may be effected through an independent modules that
is called as a surrogate for the Resolver and that, in
turn calls an unmodified Resolver-ASCII module.
4. IDNA for Host Domain Names (IDNA-Host)
ASCII Domain names used within URLs and email addresses
are subject to restrictions specified in [STD3] for host
names. Internationalized host domain names (IDN-Host)
enhances the permitted range of host addresses by
continuing the ASCII-related restrictions, but
permitting use of Unicode values.
IDNA mechanisms support IDN-Host as IDNA-Host. IDNA-
Host is IDNA with [STD3] host naming restrictions
applied to ASCII and Unicode domain names.
5. ACE
ASCII Compatible Encoding (ACE) maps between Unicode
"native" strings and an ASCII-readable representation of
the Unicode.
ACE domain labels comprise an ACE prefix string,
followed by the ACE version of the Unicode.
5.1. ACE prefix
[[ Note to the IESG and Internet Draft readers: The two
uses of the string "IESG--" below are to be changed at
time of publication to a prefix which fulfills the
requirements in the first paragraph. IANA will assign
this value. ]]
The ACE prefix, used in the conversion operations
(section 4), is two alphanumeric ASCII characters
followed by two hyphen-minuses. It cannot
be any of the prefixes already used in earlier
documents, which includes the following: "bl--", "bq--",
"dq--", "lq--", "mq--", "ra--", "wq--" and "zq--". The
ToASCII and ToUnicode operations MUST recognize the ACE
prefix in a case-insensitive manner.
The ACE prefix for IDNA is "IESG--".
This means that an ACE label might be "IESG--de-
jg4avhby1noc0d", where "de-jg4avhby1noc0d" is the part
of the ACE label that is generated by the encoding steps
in [PUNYCODE].
While all ACE labels begin with the ACE prefix, not all
labels beginning with the ACE prefix are necessarily ACE
labels. Non-ACE labels that begin with the ACE prefix
will confuse users and SHOULD NOT be allowed in DNS
zones.
5.2. ACE Enforcement
Whenever a domain name is put into an IDN-ACE domain
name slot, it MUST contain only ASCII characters.
Given an internationalized domain name (IDN), an
equivalent domain name satisfying this requirement can
be obtained by applying the ToASCII operation (see
section 4) to each label and, if dots are used as label
separators, changing all the label separators to U+002E.
5.3. ACE Display
ACE labels obtained from domain name slots SHOULD be
hidden from users except when the use of the non-ASCII
form would cause problems or when the ACE form is
explicitly requested. Given an internationalized domain
name, an equivalent domain name containing no ACE labels
can be obtained by applying the ToUnicode operation (see
section 4) to each label. When requirements 2 and 3
both apply, requirement 2 takes precedence.
5.4. ACE Conversion operations
An application converts a domain name put into an IDN-
ACE slot or displayed to a user. This section specifies
the steps to perform in the conversion, and the ToASCII
and ToUnicode operations.
The input to ToASCII or ToUnicode is a single label that
is a sequence of Unicode code points (remember that all
ASCII code points are also Unicode code points). If a
domain name is represented using a character set other
than Unicode or US-ASCII, it will first need to be
transcoded to Unicode.
Starting from a whole domain name, the steps that an
application takes to do the conversions are:
1) Decide whether the domain name is a "stored string" or a
"query string" as described in [STRINGPREP]. If this
conversion follows the "queries" rule from [STRINGPREP], set
the flag called "AllowUnassigned".
2) Split the domain name into individual labels as
described in section 3. The labels do not include the
separator.
3) Decide whether or not to enforce the restrictions on
ASCII characters in host names [STD3]. If the restrictions
are to be enforced, set the flag called "UseSTD3ASCIIRules".
4) Process each label with either the ToASCII or the
ToUnicode operation. Use the ToASCII operation if you are
about to put the name into an IDN-ACE slot. Use the ToUnicode
operation if you are displaying the name to a user.
If ToASCII was applied in step 4 and dots are used as
label separators, change all the label separators to
U+002E (full stop).
The following two subsections define the ToASCII and
ToUnicode operations that are used in step 4.
5.4.1. ToASCII
The ToASCII operation takes a sequence of Unicode code
points that make up one label and transforms it into a
sequence of code points in the ASCII range (0..7F). If
ToASCII succeeds, the original sequence and the
resulting sequence are equivalent labels.
It is important to note that the ToASCII operation can
fail. If the ToASCII operation fails on any label in a
domain name, that domain name MUST NOT be used as an
internationalized domain name. The application needs to
have some method of dealing with this failure.
The inputs to ToASCII are a sequence of code points, the
AllowUnassigned flag and the UseSTD3ASCIIRules flag. The
output of ToASCII is either a sequence of ASCII code
points or a failure condition.
ToASCII never alters a sequence of code points that are
all in the ASCII range to begin with (although it could
fail). Applying the ToASCII operation multiple times has
exactly the same effect as applying it just once.
ToASCII consists of the following steps:
1. If all code points in the sequence are in the ASCII
range (0..7F) then skip to step 3.
2. Perform the steps specified in [NAMEPREP] and fail if
there is an error. The AllowUnassigned flag is used in
[NAMEPREP].
3. If the UseSTD3ASCIIRules flag is set, then perform these
checks:
(a) Verify the absence of non-LDH ASCII code points; that
is, the absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
(b) Verify the absence of leading and trailing hyphen-minus;
that is, the absence of U+002D at the beginning and end of
the sequence.
4. If all code points in the sequence are in the ASCII
range (0..7F), then skip to step 8.
5. Verify that the sequence does NOT begin with the ACE
prefix.
6. Encode the sequence using the encoding algorithm in
[PUNYCODE]and fail if there is an error.
7. Prepend the ACE prefix.
8. Verify that the number of code points is in the range 1
to 63 inclusive.
5.4.2. ToUnicode
The ToUnicode operation takes a sequence of Unicode code
points that make up one label and returns a sequence of
Unicode code points. If the input sequence is a label in
ACE form, then the result is an equivalent
internationalized label that is not in ACE form,
otherwise the original sequence is returned unaltered.
ToUnicode never fails. If any step fails, then the
original input sequence is returned immediately in that
step.
The inputs to ToUnicode are a sequence of code points,
the AllowUnassigned flag and the UseSTD3ASCIIRules flag.
The output of ToUnicode is always a sequence of Unicode
code points.
1. If all code points in the sequence are in the ASCII
range (0..7F) then skip to step 3.
2. Perform the steps specified in [NAMEPREP] and fail if
there is an error. (If step 3 of ToASCII is also performed
here, it will not affect the overall behavior of ToUnicode,
but it is not necessary.) The AllowUnassigned flag is used in
[NAMEPREP].
3. Verify that the sequence begins with the ACE prefix, and
save a copy of the sequence.
4. Remove the ACE prefix.
5. Decode the sequence using the decoding algorithm in
[PUNYCODE] and fail if there is an error. Save a copy of the
result of this step.
6. Apply ToASCII.
7. Verify that the result of step 6 matches the saved copy
from step 3, using a case-insensitive ASCII comparison.
8. Return the saved copy from step 5.
5.5. ACE Comparison
Whenever two labels are compared, they MUST be
considered to match if and only if they are equivalent,
that is, their ASCII forms (obtained by applying
ToASCII) match using a case-insensitive ASCII
comparison.
Whenever two names are compared, they MUST be considered
to match if and only if their corresponding labels
match, regardless of whether the names use the same
forms of label separators.
6. Implications for Components in DNS
6.1. Implications for typical applications using DNS
In IDNA, applications perform the processing needed to
input internationalized domain names from users, display
internationalized domain names to users and process the
inputs and outputs from DNS and other protocols that
carry domain names.
The components and interfaces between them can be
represented pictorially as:
+------+
| User |
+------+
| Input and display:
| local interface methods
| (pen, keyboard, video, ...)
+-------------------|-------------------------------+
| v |
| +-----------------------------+ |
| | Application | |
| | (ToASCII and ToUnicode | |
| | operations may be | |
| | called here) | |
| +-----------------------------+ |
| ^ ^ |End
| | |sys
| Call to resolver: | | Application-specific |
| ACE | | protocol: |
| v | ACE unless the |
| +----------+ | protocol is updated |
| | Resolver | | to handle other |
| +----------+ | encodings |
| ^ | |
+-----------------|----------|----------------------+
DNS protocol: | |
ACE | |
v v
+-------------+ +---------------------+
| DNS servers | | Application servers |
+-------------+ +---------------------+
The box labeled "Application" is where the application
splits a host name into labels, sets the appropriate
flags, and performs the ToASCII and ToUnicode
operations. This is described in section 4.
6.1.1. Entry and display in applications
Applications can accept domain names using any character
set or sets desired by the application developer, and
can display domain names in any character set. That is,
the IDNA protocol does not affect the interface between
users and applications.
An IDNA-native application can accept and display
internationalized domain names in two formats: the
internationalized character set(s) supported by the
application, and as an ACE label. ACE labels that are
displayed or input MUST always include the ACE prefix.
Applications MAY allow input and display of ACE labels,
but are not encouraged to do so except as an interface
for special purposes, possibly for debugging.
ACE encoding is opaque and ugly, and should thus
only be exposed to users who absolutely need it.
Because name labels encoded as ACE name labels can be
rendered either as the encoded ASCII characters or the
proper decoded characters, the application MAY have an
option for the user to select the preferred method of
display; if it does, rendering the ACE SHOULD NOT be the
default.
Domain names are often stored and transported in many
places. For example, they are part of documents such as
mail messages and web pages. They are transported in
many parts of many protocols, such as both the control
commands and the RFC 2822 body parts of SMTP, and the
headers and the body content in HTTP. It is important to
remember that domain names appear both in domain name
slots and in the content that is passed over protocols.
In protocols and document formats that define how to
handle specification or negotiation of charsets, labels
can be encoded in any charset allowed by the protocol or
document format. If a protocol or document format only
allows one charset, the labels MUST be given in that
charset.
In any place where a protocol or document format allows
transmission of the characters in internationalized
labels, internationalized labels SHOULD be transmitted
using whatever character encoding and escape mechanism
the protocol or document format uses at that place.
All protocols that use domain name slots already have
the capacity for handling domain names in the ASCII
charset. Thus, ACE labels (internationalized labels that
have been processed with the ToASCII operation) can
inherently be handled by those protocols.
6.1.2. Applications and resolver libraries
Applications normally use functions in the operating
system when they resolve DNS queries. Those functions in
the operating system are often called "the resolver
library", and the applications communicate with the
resolver libraries through a programming interface
(API).
Because these resolver libraries today expect only
domain names in ASCII, applications MUST prepare labels
that are passed to the resolver library using the
ToASCII operation. Labels received from the resolver
library contain only ASCII characters; internationalized
labels that cannot be represented directly in ASCII use
the ACE form. ACE labels always include the ACE prefix.
IDNA-native applications MUST be able to work with both
non-internationalized labels (those that conform to
[STD13] and [STD3]) and internationalized labels.
It is expected that new versions of the resolver
libraries in the future will be able to accept domain
names in other formats than ASCII, and application
developers might one day pass not only domain names in
Unicode, but also in local script to a new API for the
resolver libraries in the operating system. Thus the
ToASCII and ToUnicode operations might be performed
inside these new versions of the resolver libraries.
Domain names stored in zones follow the rules for
"stored strings" from [STRINGPREP]. Domain names passed
to resolvers or put into the question section of DNS
requests follow the rules for "queries" from
[STRINGPREP].
6.1.3. DNS servers
An operating system might have a set of libraries for
performing the ToASCII operation. The input to such a
library might be in one or more charsets that are used
in applications (UTF-8 and UTF-16 are likely candidates
for almost any operating system, and script-specific
charsets are likely for localized operating systems).
For internationalized labels that cannot be represented
directly in ASCII, DNS servers MUST use the ACE form
produced by the ToASCII operation. All IDNs served by
DNS servers MUST contain only ASCII characters.
If a signaling system that makes negotiation possible
between old and new DNS clients and servers is
standardized in the future, the encoding of the query in
the DNS protocol itself can be changed from ACE to
something else, such as UTF-8. The question whether or
not this should be used is, however, a separate problem
and is not discussed in this memo.
6.1.4. Avoiding exposing users to the raw ACE encoding
All applications that might show the user a domain name
obtained from a domain name slot, such as from
gethostbyaddr or part of a mail header, SHOULD be
updated as soon as possible in order to prevent users
from seeing the ACE.
If an application decodes an ACE name using ToUnicode
but cannot show all of the characters in the decoded
name, such as if the name contains characters that the
output system cannot display, the application SHOULD
show the name in ACE format (which always includes the
ACE prefix) instead of displaying the name with the
replacement character (U+FFFD). This is to make it
easier for the user to transfer the name correctly to
other programs. Programs that by default show the ACE
form when they cannot show all the characters in a name
label SHOULD also have a mechanism to show the name that
is produced by the ToUnicode operation with as many
characters as possible and replacement characters in the
positions where characters cannot be displayed.
The ToUnicode operation does not alter labels that are
not valid ACE labels, even if they begin with the ACE
prefix. After ToUnicode has been
applied, if a label still begins with the ACE prefix,
then it is not a valid ACE label, and is not equivalent
to any of the intermediate Unicode strings constructed
by ToUnicode.
6.1.5. Bidirectional text in domain names
The display of domain names that contain bidirectional
text is not covered in this document. It may be covered
in a future version of this document, or may be covered
in a different document.
For developers interested in displaying domain names
that have bidirectional text, the Unicode standard has
an extensive discussion of how to deal with reorder
glyphs for display when dealing with bidirectional text
such as Arabic or Hebrew. See [UAX9] for more
information. In particular, all Unicode text is stored
in logical order.
6.1.6. DNSSEC authentication of IDN domain names
DNS Security [DNSSEC] is a method for supplying
cryptographic verification information along with DNS
messages. Public Key Cryptography is used in conjunction
with digital signatures to provide a means for a
requester of domain information to authenticate the
source of the data. This ensures that it can be traced
back to a trusted source, either directly, or via a
chain of trust linking the source of the information to
the top of the DNS hierarchy.
IDNA specifies that all internationalized domain names
served by DNS servers that cannot be represented
directly in ASCII must use the ACE form produced by the
ToASCII operation. This operation must be performed
prior to a zone being signed by the private key for that
zone. Because of this ordering, it is important to
recognize that DNSSEC authenticates the ASCII domain
name, not the Unicode form or the mapping between the
Unicode form and the ASCII form. In other words, the
output of ToASCII is the canonical name. In the presence
of DNSSEC, this is the name that MUST be signed in the
zone and MUST be validated against.
One consequence of this for sites deploying IDNA in the
presence of DNSSEC is that any special purpose proxies
or forwarders used to transform user input into IDNs
must be earlier in the resolution flow than DNSSEC
authenticating nameservers for DNSSEC to work.
6.1.7. Limitations of IDNA
The IDNA protocol does not solve all linguistic issues
with users inputting names in different scripts. Many
important language-based and script-based mappings are
not covered in IDNA and must be handled outside the
protocol. For example, names that are entered in a mix
of traditional and simplified Chinese characters will
not be mapped to a single canonical name. Another
example is Scandinavian names that are entered with
U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) will not be
mapped to U+00F8 (LATIN SMALL LETTER O WITH STROKE).
6.2. Name Server Considerations
Internationalized domain name data in zone files (as
specified by section 5 of RFC 1035) MUST be processed
with ToASCII before it is entered in the zone files.
It is imperative that there be only one ASCII encoding
for a particular domain name. Thus, a primary master
name server MUST NOT contain an ACE-encoded label that
decodes to an ASCII label. The ToASCII operation assures
that no such names are ever output from the operation.
Name servers MUST NOT serve records with domain names
that contain non-ASCII characters; such names MUST be
converted to ACE form by the
ToASCII operation in order to be served. If names that
are not processed by ToASCII are passed to an
application, it will result in unpredictable behavior.
Note that [STRINGPREP] describes how to handle
versioning of unallocated codepoints.
6.3. Root Server Considerations
IDNA strings are likely to be somewhat longer than
current host names, so the bandwidth needed by the root
servers should go up by a small amount. In addition,
queries and responses using IDNA strings will probably
be somewhat longer than typical queries today, so more
queries and responses may be forced to go to TCP instead
of UDP.
7. References
7.1. Normative references
[PUNYCODE] Adam Costello, "Punycode: An encoding of
Unicode for use with IDNA", draft-ietf-idn-punycode.
[NAMEPREP] Paul Hoffman and Marc Blanchet, "Nameprep: A
Stringprep Profile for Internationalized Domain Names",
draft-ietf-idn-nameprep.
[STD3] Bob Braden, "Requirements for Internet Hosts --
Communication Layers" (RFC 1122) and "Requirements for
Internet Hosts -- Application and Support" (RFC 1123),
STD 3, October 1989.
[STD13] Paul Mockapetris, "Domain names - concepts and
facilities" (RFC 1034) and "Domain names -
implementation and specification" (RFC 1035), STD 13,
November 1987.
[STRINGPREP] Paul Hoffman and Marc Blanchet,
"Preparation of Internationalized Strings
("stringprep")", draft-hoffman-stringprep, work in
progress
7.2. Informative references
[DNSSEC] Don Eastlake, "Domain Name System Security
Extensions", RFC 2535, March 1999.
[RFC2119] Scott Bradner, "Key words for use in RFCs to
Indicate Requirement Levels", March 1997, RFC 2119.
[UAX9] Unicode Standard Annex #9, The Bidirectional
Algorithm,
<http://www.unicode.org/unicode/reports/tr9/>.
[UNICODE] The Unicode Standard, Version 3.1.0: The
Unicode Consortium. The Unicode Standard, Version 3.0.
Reading, MA, Addison-Wesley Developers Press, 2000. ISBN
0-201-61633-5, as amended by: Unicode Standard Annex
#27: Unicode 3.1,
<http://www.unicode.org/unicode/reports/tr27/tr27-
4.html>.
[USASCII] Vint Cerf, "ASCII format for Network
Interchange", October 1969, RFC 20.
8. Security Considerations
Security on the Internet partly relies on the DNS. Thus,
any change to the characteristics of the DNS can change
the security of much of the Internet.
This memo describes an algorithm that encodes characters
that are not valid according to STD3 and STD13 into
octet values that are valid. No
security issues such as string length increases or new
allowed values are introduced by the encoding process or
the use of these encoded values, apart from those
introduced by the ACE encoding itself.
Domain names are used by users to connect to Internet
servers. The security of the Internet would be
compromised if a user entering a single
internationalized name could be connected to different
servers based on different interpretations of the
internationalized domain name.
Because this document normatively refers to [NAMEPREP],
it includes the security considerations from that
document as well.
9. Authors' Addresses
Patrik Faltstrom
Cisco Systems
Arstaangsvagen 31 J
S-117 43 Stockholm Sweden
paf@cisco.com
Paul Hoffman
Internet Mail Consortium and VPN Consortium
127 Segre Place
Santa Cruz, CA 95060 USA
phoffman@imc.org
Adam M. Costello
University of California, Berkeley
idna-spec.amc @ nicemice.net
APPENDIX
A.1. Brief overview for application developers
Applications can use IDNA to support internationalized
domain names anywhere that ASCII domain names are
already supported, including DNS master files and
resolver interfaces. (Applications can also define
protocols and interfaces that support IDNs directly
using non-ASCII representations. IDNA does not prescribe
any particular representation for new protocols, but it
still defines which names are valid and how they are
compared.)
The IDNA protocol is contained completely within
applications. It is not a client-server or peer-to-peer
protocol: everything is done inside the application
itself. When used with a DNS resolver library, IDNA is
inserted as a "shim" between the application and the
resolver library. When used for writing names into a DNS
zone, IDNA is used just before the name is committed to
the zone.
There are two operations described in section 4 of this
document:
- The ToASCII operation is used before sending an IDN to
something that expects ASCII names (such as a resolver)
or writing an IDN into a place that expects ASCII names
(such as a DNS master file).
- The ToUnicode operation is used when displaying names
to users, for example names obtained from a DNS zone.
It is important to note that the ToASCII operation can
fail. If it fails when processing a domain name, that
domain name cannot be used as an internationalized
domain name and the application has to have some method
of dealing with this failure.
IDNA requires that implementations process input strings
with Nameprep [NAMEPREP], which is a profile of
Stringprep [STRINGPREP], and then with Punycode
[PUNYCODE]. Implementations of IDNA MUST fully implement
Nameprep and Punycode; neither Nameprep nor Punycode are
optional.
| PAFTECH AB 2003-2026 | 2026-04-22 23:34:50 |