http://stupid.domain.name/ietf/

One document matched: draft-ietf-precis-problem-statement-00.xml
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC '' 
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
<!ENTITY bibxml2rfc-informative SYSTEM "draft-ietf-precis-problem-statement-00.xml-informative">
<!ENTITY bibxml2rfc-normative SYSTEM "draft-ietf-precis-problem-statement-00.xml-normative">
]>




<rfc category="info" ipr="pre5378Trust200902" docName="draft-ietf-precis-problem-statement-00.txt">
  <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
  <?rfc toc="yes" ?>
  <?rfc symrefs="yes" ?>
  <?rfc sortrefs="yes"?>
  <?rfc iprnotified="no" ?>
  <?rfc strict="yes" ?>
  <?rfc compact="yes" ?>
  <?rfc comments="yes"?>
  <?rfc inline="yes"?>

  <front>
    <title>Stringprep Revision Problem Statement</title>
    <author initials="M." surname="Blanchet" fullname="Marc Blanchet">
      <organization>Viagenie</organization>
      <address>
        <postal>
          <street>2600 boul. Laurier, suite 625</street>
          <city>Quebec</city>
          <region>QC</region>
          <code>G1V 4W1</code>
          <country>Canada</country>
        </postal>
        <email>Marc.Blanchet@viagenie.ca</email>
        <uri>http://viagenie.ca</uri>
      </address>
    </author>
    <author initials="A." surname="Sullivan" fullname="Andrew Sullivan">
      <address>
        <postal>
          <street>519 Maitland St.</street>
          <city>London</city>
          <region>ON</region>
          <code>N6B 2Z5</code>
          <country>Canada</country>
        </postal>
        <email>ajs@crankycanuck.ca</email>
      </address>
    </author>
    <date month="October" year="2010"/>
    <abstract>
      <t>
        Using Unicode codepoints in protocol strings that expect
        comparison with other strings <cref
        source="ajs@shinkuro.com">The WG will need to decide whether
        "other strings" is too broad.  In particular, what about
        protocol slots that can take strings other than plain
        ASCII?</cref> requires preparation of the string that contains
        the Unicode codepoints.  Internationalizing Domain Names in
        Applications (IDNA2003) defined and used Stringprep and
        Nameprep. Other protocols subsequently defined Stringprep
        profiles. A new approach different from Stringprep and
        Nameprep is used for a revision of IDNA2003 (called
        IDNA2008). Other Stringprep profiles need to be similarly
        updated or a replacement of Stringprep need to be
        designed. This document outlines the issues to be faced by
        those designing a Stringprep replacement.
      </t>
    </abstract>
  </front>
  <middle>
    <section title="Introduction">
      <t>
        Internationalizing Domain Names in Applications (IDNA2003)
        <xref target="RFC3490"/>, <xref target="RFC3491" />, <xref
        target="RFC3492" />, <xref target="RFC3454" /> described a
        mechanism for encoding UTF-8 labels making up
        Internationalized Domain Names (IDNs) as standard DNS labels.
        The labels were processed using a method called Nameprep <xref
        target="RFC3491"/> and Punycode <xref target="RFC3492"/>.
        That method was specific to IDNA2003, but is generalized as
        Stringprep <xref target="RFC3454"/>.  The general mechanism
        can be used to help other protocols with similar needs, but
        with different constraints than IDNA2003.
      </t>
      <t>Stringprep defines a framework within which protocols define their
      Stringprep profiles.  Known IETF specifications using Stringprep are
      listed below:
      <list style="symbols">
        <t>The Nameprep profile <xref target="RFC3490"/> for use in
        Internationalized Domain Names (IDNs);</t>
        <t>NFSv4 <xref target="RFC3530" /> and NFSv4.1 <xref
        target="RFC5661" />;</t>
        <t>The iSCSI profile <xref target="RFC3722"/> for use in
        Internet Small Computer Systems Interface (iSCSI) Names;</t>
        <t>EAP <xref target="RFC3748" />;</t>
        <t>The Nodeprep and Resourceprep profiles <xref
        target="RFC3920"/> for use in the Extensible Messaging and
        Presence Protocol (XMPP), and the XMPP to CPIM mapping <xref
        target="RFC3922" />;</t>
        <t>The Policy MIB profile <xref target="RFC4011"/> for use in
        the Simple Network Management Protocol (SNMP);</t>
        <t>The SASLprep profile <xref target="RFC4013"/> for use in
        the Simple Authentication and Security Layer (SASL), and SASL
        itself <xref target="RFC4422" />;</t>
        <t>TLS <xref target="RFC4279" />;</t>
        <t>IMAP4 using SASLprep <xref target="RFC4314" />;</t>
        <t>The trace profile <xref target="RFC4505"/> for use with the
        SASL ANONYMOUS mechanism;</t>
        <t> The LDAP profile <xref target="RFC4518"/> for use with
        LDAP <xref target="RFC4511" /> and its authentication methods
        <xref target="RFC4513" />;</t>
        <t>Plain SASL using SASLprep <xref target="RFC4616" />;</t>
        <t>NNTP using SASLprep <xref target="RFC4643" />;</t>
        <t>PKIX subject identification using LDAPprep <xref
        target="RFC4683" />;</t>
        <t>Internet Application Protocol Collation Registry <xref
        target="RFC4790" />;</t>
        <t>SMTP Auth using SASLprep <xref target="RFC4954" />;</t>
        <t>POP3 Auth using SASLprep <xref target="RFC5034" />;</t>
        <t>TLS SRP using SASLprep <xref target="RFC5054" />;</t>
        <t>IRI and URI in XMPP <xref target="RFC5122" />;</t>
        <t>PKIX CRL using LDAPprep <xref target="RFC5280" />;</t>
        <t>IAX using Nameprep <xref target="RFC5456" />;</t>
        <t>SASL SCRAM using SASLprep <xref target="RFC5802" />;</t>
        <t>Remote management of Sieve using SASLprep <xref target="RFC5804" />;</t>
        <t>The i;unicode-casemap Unicode Collation <xref target="RFC5051" />.</t>
      </list>
      </t>
      <t>There turned out to be some difficulties with IDNA2003, documented
      in <xref target="RFC4690" />.  These difficulties led to a new IDN
      specification, called IDNA2008 <xref target="RFC5890" />, <xref
      target="RFC5891" />, <xref target="RFC5892" />, <xref target="RFC5893"
      />.  Additional background and explanations of the decisions embodied
      in IDNA2008 is presented in <xref target="RFC5894" />.  One of the
      effects of IDNA2008 is that Nameprep and Stringprep are not used at
      all.  Instead, an algorithm based on Unicode properties of codepoints
      is defined. That algorithm generates a stable and complete table of
      the supported Unicode codepoints. This algorithm is based on an
      inclusion-based approach, instead of the exclusion-based approach of
      Stringprep/Nameprep.
      </t>
      <t>This document lists the shortcomings and issues found by protocols
      listed above that defined Stringprep profiles. It also lists some
      early conclusions and requirements for a potential replacement of
      Stringprep.</t>
    </section>
    <section title="Usage and Issues of Stringprep">
      
      <section title="Issues raised during newprep BOF">
        <t>During IETF 77, a BOF discussed the current state of the
        protocols that have defined Stringprep profiles <xref
        target="NEWPREP" />. The main conclusions are :
        <list style="symbols">
          <t>Stringprep is bound to a specific version of Unicode:
          3.2. Stringprep has not been updated to new versions of
          Unicode. Therefore, the protocols using Stringprep are stuck to
          Unicode 3.2.</t>
          
          <t>The protocols need to be updated to support new versions of
          Unicode. The protocols would like to not be bound to a specific
          version of Unicode, but rather have better Unicode agility in
          the way of IDNA2008.  This is important partly because it is
          usually impossible for an application to require Unicode 3.2;
          the application gets whatever version of Unicode is available on
          the host.</t>
          
          <t>The protocols require better bidirectional support (bidi) than
          currently offered by Stringprep. </t>
          
          <t>If the protocols are updated to use a new version of
          Stringprep or another framework, then backward compatibility
          is an important requirement. For example, Stringprep is
          based on and may use NFKC <xref target="UAX15" />, while
          IDNA2008 mostly uses NFC <xref target="UAX15" />.</t>
          
          <t>Protocols use each other; for example, a protocol can use
          user identifiers that are later passed to SASL, LDAP or another
          authentication mechanism. Therefore, common set of rules or
          classes of strings are preferred over specific rules for each
          protocol.</t>
        </list>
        </t>
        <t>Protocols that use Stringprep profiles use strings for
        different purposes:
        <list style="symbols">
          <t>XMPP uses a different Stringprep profile for each part of the
          XMPP address (JID): a localpart which is similar to a username and
          used for authentication, a domainpart which is a domain name and a
          resource part which is less restrictive than the localpart.</t>
          
          <t>iSCSI uses a Stringprep profile for the IQN, which is
          very similar to (often is) a DNS domain name. </t>
          
          <t>SASL and LDAP uses a Stringprep profile for usernames.</t>
          
          <t>LDAP uses a set of Stringprep profiles.</t>
        </list>
        </t>
        
        <t>During the newprep BOF, it was the consensus of the
        attendees that it would be highly preferable to have a
        replacement of Stringprep, with similar characteristics to
        IDNA2008. That replacement should be defined so that the
        protocols could use internationalized strings without a lot of
        specialized internationalization work, since
        internationalization expertise is not available in the
        respective protocols or working groups.</t>
      </section>

      <section title="Specific issues with particular Stringprep
                      profiles">
        <t><cref source="ajs@shinkuro.com">This section is where
        issues raised in the individual profile reviews goes.  A
        review of the WG trac state on 2010-10-06 of the tracker
        suggests those reviews haven't happened yet.</cref></t>
      </section>
      
      <section title="Inclusion vs. exclusion of characters">
        <t>One of the primary changes of IDNA2008 is in the way it
        approaches Unicode characters.  IDNA2003 created an explicit list
        of excluded or mapped-away characters; anything in Unicode 3.2
        that was not so listed could be assumed to be allowed under the
        protocol.  IDNA2008 begins instead from the assumption that
        characters are disallowed, and then relies on Unicode properties
        to derive whether a given character actually is allowed in the
        protocol.</t> 

        <t>Moreover, there is more than one class of "allowed in the
        protocol".  While some characters are simply disallowed, some
        are allowed only in certain contexts. The reasons for the
        context-dependent rules have to do with the way some
        characters are used.  For instance, the ZERO WIDTH JOINER and
        ZERO WIDTH NON-JOINER characters (ZWJ, U+200D and ZWNJ,
        U+200C) are allowed with contextual rules because they are
        required in some circumstances, yet are considered punctuation
        by Unicode and would therefore be DISALLOWED under the usual
        IDNA2008 derivation rules.</t>
        
        <t>The working group needs to decide whether similar contextual
        cases need to be supported.</t>
      </section>
      
      <section title="Stringprep and NFKC">
        <t>Stringprep profiles may use normalization.  If they do,
        they use NFKC <xref target="UAX15" />.  It is not clear that
        NFKC is the right normalization to use in all cases.  In <xref
        target="UAX15" />, there is the following observation
        regarding Normalization Forms KC and KD: "It is best to think
        of these Normalization Forms as being like uppercase or
        lowercase mappings: useful in certain contexts for identifying
        core meanings, but also performing modifications to the text
        that may not always be appropriate."  For things like the
        spelling of users' names, then, NKFC may not be the best form
        to use.  At the same time, one of the nice things about NFKC
        is that it deals with the width of characters that are
        otherwise similar, by canonicalizing half-width to full-width.
        This mapping step can be crucial in practice.  The WG will
        need to analyze the different use profiles and consider
        whether NFKC or NFC is a better normalization for each
        profile.</t>
      </section>
      
      <section title="Case mapping">
        <t>In IDNA2003, labels are always mapped to lower case before
        the Punycode transformation.  In IDNA2003, there is no mapping
        at all: input is either a valid U-label or it is not.  At the
        same time, upper-case characters are by definition not valid
        U-labels, because they fall into the Unstable category
        (category B) of <xref target="RFC5892" />.</t>
        <t>If there are protocols that require upper and lower cases be
        preserved, then the analogy with IDNA2008 will break down.  The
        working group will need to decide whether there are any cases that
        require upper case, and what to do about it if so.</t>
      </section>

      <section title="Whether to use ASCII-compatible encoding">
      
      <t>The development of IDNA2008 depended on the notion that there
      was a narrow repertoire of reasonable traditional labels, and
      what was necessary was to internationalize that repertoire
      rather than to incorporate any characters into domain name
      labels.  More exactly, the idea was to internationalize the
      traditional hostname rules (the "LDH rule". See <xref
      target="RFC4690" />, section 5.1.).  Efforts to internationalize
      email (<xref target="RFC5336" />) have started from different
      assumptions.  The email example suggests that in some cases, the
      right answer might be to internationalize the target protocol
      rather than to depend on a technology to ensure protocol slots
      can use only ASCII.  The working group will need to determine
      which approach is correct for the different use-cases.</t>
      </section>

      <section title="Issues with delimiters">
        <t>There are two kinds of issues to address with delimiters.
        First, exactly where a delimiter will appear on the screen
        when dealing with bidirectional parts of a string can be
        extremely surprising.  In the case of IDNA2008, just what to
        do in these cases remains a display issue (there is no
        question about the wire format, because the wire format is an
        A-label and it is always left to right).</t>
        <t>Second, there is the question of whether to include
        different kinds of protocol separators.  For instance, FULL
        STOP, U+002E (.) may not be available on all keyboards.  In
        addition, in some languages there is more than one full stop
        which are variants of one another.  The working group will
        need to decide how to handle such cases: whether there will be
        a mapping, some restrictions, or something else.</t>
      </section>

    </section>

      
    <section title="Considerations for Stringprep replacement">
      <t>The above suggests the following direction for the working group:
      <list style="symbols">
        <t>A stringprep replacement should be defined.</t>
        
        <t>The replacement should take an approach similar to IDNA2008,
        in that it enables Unicode agility.</t>

        <t>Protocols share similar characteristics of strings. Therefore,
        defining i18n preparation algorithms for a (small) set of string
          classes may be sufficient for most cases and provides the
        coherence among a set of protocol friends.</t>

        <t>The sets of string classes need to be evaluated for the
        following properties:
        <list>
          <t>the normalization needed (NFC vs NFKC);</t>
          <t>whether case-folding, case preservation, and
          case-insensitive matching is needed;</t>
          <t>what restrictions on input are reasonable for the class
          (i.e. whether there is something like an "LDH rule" for the
          class), or whether the ASCII-only input in the protocol slot
          is lightly constrained;</t>
          <t>the extent to which bidi considerations are important for
          the class.</t>
        </list>
        </t>
      </list>
      </t>

      <t>Existing deployments already depend on Stringprep profiles.
      Therefore, the working group will need to consider the effects
      of any new strategy on existing deployments.  By way of
      comparison, it is worth noting that some characters were
      acceptable in IDNA labels under IDNA2003, but are not
      protocol-valid under IDNA2008 (and conversely).  Different
      implementers may make different decisions about what to do in
      such cases; this could have interoperability effects.  The
      working group will need to trade better support for different
      linguistic environments against the potential side effects of
      backward incompatibility.</t>

    </section>
    <section title="Security Considerations">
      <t>This document merely states what problems are to be solved,
      and does not define a protocol.  There are undoubtedly security
      implications of the particular results that will come from the
      work to be completed.</t>
    </section>
    <section title="IANA Considerations">
      <t>
        This document has no actions for IANA.
      </t>
    </section>
    <section title="Discussion home for this draft">
      <t>
        This document is intended to define the problem space
        discussed on the precis@ietf.org mailing list.
      </t>
    </section>
  </middle>
  <back>
    <references title="Informative References">
      &bibxml2rfc-informative;
    </references>
  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-23 19:32:24