One document matched: draft-josefsson-getaddrinfo-idn-00.xml


<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<?rfc compact="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>

<rfc ipr="trust200902"
     docName="draft-josefsson-getaddrinfo-idn-00">

  <front>

    <title abbrev="getaddrinfo IDN support">
      Internationalized Domain Names support in POSIX getaddrinfo
    </title>

    <author initials="S." surname="Josefsson" fullname="Simon Josefsson">
      <organization>SJD AB</organization>
      <address>
	<email>simon@josefsson.org</email>
      </address>
    </author>
    
    <date month="June" year="2010"/>

    <abstract>

      <t>This document describes an extension for Internationalized
	Domain Names support in the POSIX getaddrinfo function.</t>

    </abstract>

  </front>
  
  <middle>

    <section title="Preface">

      <t>This document was originally written in 2003 and published
	and implemented as part of GNU Libidn.  This is a copy of the
	memo but in IETF form.  The document was written informally,
	and does not (yet?) follow typical IETF document formats.  The
	intention is to make the IETF community aware of this work, to
	see if there are interest in the ideas.</t>

    </section>

    <section title="Background">

      <t>Libidn is a package for internationalized string handling
	based on the Stringprep, Punycode and Internationalized Domain
	Names in Applications (IDNA) specifications.  It can be used
	by applications directly by linking to it, as is done by,
	e.g., Gnus, KDE, and Mutt.</t>

      <t>Having each and every application link with and perform its
	own IDN handling is not a good idea.  It bloats the code and
	makes things unnecessarily complex.  Only few applications,
	such as web browsers and mail clients, will need to do this in
	the future, to provide good user interfaces for
	internationalization.</t>

      <t>See http://josefsson.org/libidn/ for more information.</t>

    </section>

    <section title="Alternative Approaches">

      <t>There are implementation that modify gethostbyname() to
	accept UTF-8 strings and perform the IDNA ToASCII operation
	within gethostbyname().</t>

      <t>There are even implementations that assume gethostbyname (on
	the client host) perform no validation of the string and will
	send UTF-8 strings out to the DNS server, and perform the
	IDN-conversion on the DNS server.</t>

      <t>Some doubts can be raised whether this is an approach that is
	likely to be standardized.  It also lack in functionality: it
	only provide black-box ToASCII functionality.  The application
	cannot extract the output from the ToASCII operation.  More
	important, there is no way to perform a ToUnicode operation
	that applications may want to use for display purposes.
	Furthermore, while the first can support locale specific
	character sets (e.g., ISO-8859-1), the second approach is
	bound to either guess the character set, or always use UTF-8.</t>

      <t>See also the thread rooted in
	<iluel7n6bmu.fsf@latte.josefsson.org> posted to
	libc-alpha@sources.redhat.com on 08 Jan 2003.</t>

    </section>

    <section title="What I propose">

      <t>The getaddrinfo() API should have two new flags, AI_IDN and
	AI_CANONIDN.  Roughly they correspond to IDNA ToASCII and IDNA
	ToUnicode, but there are several details.  Note that strings
	are still 'char*', i.e. it does not use the "wide" character
	type, and that the encoding of non-ASCII strings are the
	current locale's character set (i.e.,
	nl_langinfo(CODESET)).</t>

      <t>An application that uses AI_IDN signal to the getaddrinfo()
	implementation that the input host name may be non-ASCII and
	that the appropriate IDNA ToASCII steps should be carried out
	on the input, and the output from the ToASCII operation (if
	any) should be used in the lookup using the current resolver
	processing.</t>

      <t>An application that uses AI_CANONIDN signal to the
	getaddrinfo() implementation that the input host name should
	be put through the IDNA ToUnicode steps, and the output of
	that placed in the 'ai_canonname' field of the resulting
	structure.  Normal resolver processing applies to the input
	string, of course.</t>

      <t>Consequently, an application that uses AI_IDN|AI_CANONIDN
	signal to the getaddrinfo() implementation that the input host
	name may be non-ASCII and should be put through the IDNA
	ToASCII steps before run through the resolver, and that the
	input string should also be run through the IDNA ToUnicode
	steps and the output of that placed in the 'ai_canonname'
	field.</t>

      <t>The semantics of AI_CANONNAME|AI_CANONIDN is that instead of
	running the ToUnicode IDNA steps on the input string, the
	canonical host name as returned by the resolver for the input
	string should be used in the ToUnicode IDNA step.</t>

    </section>

    <section title="Details">

      <t>Four new flags has been proposed; AI_IDN_ALLOW_UNASSIGNED,
	AI_IDN_USE_STD3_ASCII_RULES for getaddrinfo, and
	NI_IDN_ALLOW_UNASSIGNED, NI_IDN_USE_STD3_ASCII_RULES for
	getnameinfo.  The implementation is simple, if specified those
	flag will set the appropriate flag in the call to the IDNA
	functions.  See the RFC for the meaning of those flags.</t>

    </section>

    <section title="Status">

      <t>The AI_IDN flag has been implemented and shipped as a
	proof-of-concept patch for GNU Libc with GNU Libidn since
	January 2003.  Binary libc packages with the patch exists for
	(at least) two GNU/Linux distributions.  The AI_CANONIDN flag
	is not yet implemented.</t>

      <t>As of March 2004, Libidn has been integrated as an add-on in
	the GNU Libc CVS repository.  The AI_CANONIDN flag has been
	implemented.  The AllowUnassigned and UseSTD3ASCIIRules flags
	were added.</t>

    </section>

    <section title="Future">

      <t>Allow non-ASCII in gethostname (and similar functions), if
	administrator has supplied, e.g., 'option idn' in
	/etc/resolv.conf?</t>
    </section>

    <section title="Feedback">

      <t>This document is a work-in-progress and the details may
	change.  Contact me at simon@josefsson.org to discuss
	changes.</t>

    </section>

    <section title="Security Considerations">

      <t>TBA.</t>

    </section>

    <section title="IANA Considerations">

      <t>TBA.</t>

    </section>

    <section title="Acknowledgements">

      <t>Ulrich Drepper integrated the work in GNU Libc.</t>

    </section>

  </middle>

  <back>

    <references title="Normative References">

    </references>

  </back>

</rfc>

PAFTECH AB 2003-20262026-04-21 18:12:47