One document matched: draft-josefsson-getaddrinfo-idn-00.xml
<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc compact="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<rfc ipr="trust200902"
docName="draft-josefsson-getaddrinfo-idn-00">
<front>
<title abbrev="getaddrinfo IDN support">
Internationalized Domain Names support in POSIX getaddrinfo
</title>
<author initials="S." surname="Josefsson" fullname="Simon Josefsson">
<organization>SJD AB</organization>
<address>
<email>simon@josefsson.org</email>
</address>
</author>
<date month="June" year="2010"/>
<abstract>
<t>This document describes an extension for Internationalized
Domain Names support in the POSIX getaddrinfo function.</t>
</abstract>
</front>
<middle>
<section title="Preface">
<t>This document was originally written in 2003 and published
and implemented as part of GNU Libidn. This is a copy of the
memo but in IETF form. The document was written informally,
and does not (yet?) follow typical IETF document formats. The
intention is to make the IETF community aware of this work, to
see if there are interest in the ideas.</t>
</section>
<section title="Background">
<t>Libidn is a package for internationalized string handling
based on the Stringprep, Punycode and Internationalized Domain
Names in Applications (IDNA) specifications. It can be used
by applications directly by linking to it, as is done by,
e.g., Gnus, KDE, and Mutt.</t>
<t>Having each and every application link with and perform its
own IDN handling is not a good idea. It bloats the code and
makes things unnecessarily complex. Only few applications,
such as web browsers and mail clients, will need to do this in
the future, to provide good user interfaces for
internationalization.</t>
<t>See http://josefsson.org/libidn/ for more information.</t>
</section>
<section title="Alternative Approaches">
<t>There are implementation that modify gethostbyname() to
accept UTF-8 strings and perform the IDNA ToASCII operation
within gethostbyname().</t>
<t>There are even implementations that assume gethostbyname (on
the client host) perform no validation of the string and will
send UTF-8 strings out to the DNS server, and perform the
IDN-conversion on the DNS server.</t>
<t>Some doubts can be raised whether this is an approach that is
likely to be standardized. It also lack in functionality: it
only provide black-box ToASCII functionality. The application
cannot extract the output from the ToASCII operation. More
important, there is no way to perform a ToUnicode operation
that applications may want to use for display purposes.
Furthermore, while the first can support locale specific
character sets (e.g., ISO-8859-1), the second approach is
bound to either guess the character set, or always use UTF-8.</t>
<t>See also the thread rooted in
<iluel7n6bmu.fsf@latte.josefsson.org> posted to
libc-alpha@sources.redhat.com on 08 Jan 2003.</t>
</section>
<section title="What I propose">
<t>The getaddrinfo() API should have two new flags, AI_IDN and
AI_CANONIDN. Roughly they correspond to IDNA ToASCII and IDNA
ToUnicode, but there are several details. Note that strings
are still 'char*', i.e. it does not use the "wide" character
type, and that the encoding of non-ASCII strings are the
current locale's character set (i.e.,
nl_langinfo(CODESET)).</t>
<t>An application that uses AI_IDN signal to the getaddrinfo()
implementation that the input host name may be non-ASCII and
that the appropriate IDNA ToASCII steps should be carried out
on the input, and the output from the ToASCII operation (if
any) should be used in the lookup using the current resolver
processing.</t>
<t>An application that uses AI_CANONIDN signal to the
getaddrinfo() implementation that the input host name should
be put through the IDNA ToUnicode steps, and the output of
that placed in the 'ai_canonname' field of the resulting
structure. Normal resolver processing applies to the input
string, of course.</t>
<t>Consequently, an application that uses AI_IDN|AI_CANONIDN
signal to the getaddrinfo() implementation that the input host
name may be non-ASCII and should be put through the IDNA
ToASCII steps before run through the resolver, and that the
input string should also be run through the IDNA ToUnicode
steps and the output of that placed in the 'ai_canonname'
field.</t>
<t>The semantics of AI_CANONNAME|AI_CANONIDN is that instead of
running the ToUnicode IDNA steps on the input string, the
canonical host name as returned by the resolver for the input
string should be used in the ToUnicode IDNA step.</t>
</section>
<section title="Details">
<t>Four new flags has been proposed; AI_IDN_ALLOW_UNASSIGNED,
AI_IDN_USE_STD3_ASCII_RULES for getaddrinfo, and
NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES for
getnameinfo. The implementation is simple, if specified those
flag will set the appropriate flag in the call to the IDNA
functions. See the RFC for the meaning of those flags.</t>
</section>
<section title="Status">
<t>The AI_IDN flag has been implemented and shipped as a
proof-of-concept patch for GNU Libc with GNU Libidn since
January 2003. Binary libc packages with the patch exists for
(at least) two GNU/Linux distributions. The AI_CANONIDN flag
is not yet implemented.</t>
<t>As of March 2004, Libidn has been integrated as an add-on in
the GNU Libc CVS repository. The AI_CANONIDN flag has been
implemented. The AllowUnassigned and UseSTD3ASCIIRules flags
were added.</t>
</section>
<section title="Future">
<t>Allow non-ASCII in gethostname (and similar functions), if
administrator has supplied, e.g., 'option idn' in
/etc/resolv.conf?</t>
</section>
<section title="Feedback">
<t>This document is a work-in-progress and the details may
change. Contact me at simon@josefsson.org to discuss
changes.</t>
</section>
<section title="Security Considerations">
<t>TBA.</t>
</section>
<section title="IANA Considerations">
<t>TBA.</t>
</section>
<section title="Acknowledgements">
<t>Ulrich Drepper integrated the work in GNU Libc.</t>
</section>
</middle>
<back>
<references title="Normative References">
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-21 18:12:47 |