One document matched: draft-ietf-precis-problem-statement-00.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
<!ENTITY bibxml2rfc-informative SYSTEM "draft-ietf-precis-problem-statement-00.xml-informative">
<!ENTITY bibxml2rfc-normative SYSTEM "draft-ietf-precis-problem-statement-00.xml-normative">
]>
<rfc category="info" ipr="pre5378Trust200902" docName="draft-ietf-precis-problem-statement-00.txt">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<front>
<title>Stringprep Revision Problem Statement</title>
<author initials="M." surname="Blanchet" fullname="Marc Blanchet">
<organization>Viagenie</organization>
<address>
<postal>
<street>2600 boul. Laurier, suite 625</street>
<city>Quebec</city>
<region>QC</region>
<code>G1V 4W1</code>
<country>Canada</country>
</postal>
<email>Marc.Blanchet@viagenie.ca</email>
<uri>http://viagenie.ca</uri>
</address>
</author>
<author initials="A." surname="Sullivan" fullname="Andrew Sullivan">
<address>
<postal>
<street>519 Maitland St.</street>
<city>London</city>
<region>ON</region>
<code>N6B 2Z5</code>
<country>Canada</country>
</postal>
<email>ajs@crankycanuck.ca</email>
</address>
</author>
<date month="October" year="2010"/>
<abstract>
<t>
Using Unicode codepoints in protocol strings that expect
comparison with other strings <cref
source="ajs@shinkuro.com">The WG will need to decide whether
"other strings" is too broad. In particular, what about
protocol slots that can take strings other than plain
ASCII?</cref> requires preparation of the string that contains
the Unicode codepoints. Internationalizing Domain Names in
Applications (IDNA2003) defined and used Stringprep and
Nameprep. Other protocols subsequently defined Stringprep
profiles. A new approach different from Stringprep and
Nameprep is used for a revision of IDNA2003 (called
IDNA2008). Other Stringprep profiles need to be similarly
updated or a replacement of Stringprep need to be
designed. This document outlines the issues to be faced by
those designing a Stringprep replacement.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
Internationalizing Domain Names in Applications (IDNA2003)
<xref target="RFC3490"/>, <xref target="RFC3491" />, <xref
target="RFC3492" />, <xref target="RFC3454" /> described a
mechanism for encoding UTF-8 labels making up
Internationalized Domain Names (IDNs) as standard DNS labels.
The labels were processed using a method called Nameprep <xref
target="RFC3491"/> and Punycode <xref target="RFC3492"/>.
That method was specific to IDNA2003, but is generalized as
Stringprep <xref target="RFC3454"/>. The general mechanism
can be used to help other protocols with similar needs, but
with different constraints than IDNA2003.
</t>
<t>Stringprep defines a framework within which protocols define their
Stringprep profiles. Known IETF specifications using Stringprep are
listed below:
<list style="symbols">
<t>The Nameprep profile <xref target="RFC3490"/> for use in
Internationalized Domain Names (IDNs);</t>
<t>NFSv4 <xref target="RFC3530" /> and NFSv4.1 <xref
target="RFC5661" />;</t>
<t>The iSCSI profile <xref target="RFC3722"/> for use in
Internet Small Computer Systems Interface (iSCSI) Names;</t>
<t>EAP <xref target="RFC3748" />;</t>
<t>The Nodeprep and Resourceprep profiles <xref
target="RFC3920"/> for use in the Extensible Messaging and
Presence Protocol (XMPP), and the XMPP to CPIM mapping <xref
target="RFC3922" />;</t>
<t>The Policy MIB profile <xref target="RFC4011"/> for use in
the Simple Network Management Protocol (SNMP);</t>
<t>The SASLprep profile <xref target="RFC4013"/> for use in
the Simple Authentication and Security Layer (SASL), and SASL
itself <xref target="RFC4422" />;</t>
<t>TLS <xref target="RFC4279" />;</t>
<t>IMAP4 using SASLprep <xref target="RFC4314" />;</t>
<t>The trace profile <xref target="RFC4505"/> for use with the
SASL ANONYMOUS mechanism;</t>
<t> The LDAP profile <xref target="RFC4518"/> for use with
LDAP <xref target="RFC4511" /> and its authentication methods
<xref target="RFC4513" />;</t>
<t>Plain SASL using SASLprep <xref target="RFC4616" />;</t>
<t>NNTP using SASLprep <xref target="RFC4643" />;</t>
<t>PKIX subject identification using LDAPprep <xref
target="RFC4683" />;</t>
<t>Internet Application Protocol Collation Registry <xref
target="RFC4790" />;</t>
<t>SMTP Auth using SASLprep <xref target="RFC4954" />;</t>
<t>POP3 Auth using SASLprep <xref target="RFC5034" />;</t>
<t>TLS SRP using SASLprep <xref target="RFC5054" />;</t>
<t>IRI and URI in XMPP <xref target="RFC5122" />;</t>
<t>PKIX CRL using LDAPprep <xref target="RFC5280" />;</t>
<t>IAX using Nameprep <xref target="RFC5456" />;</t>
<t>SASL SCRAM using SASLprep <xref target="RFC5802" />;</t>
<t>Remote management of Sieve using SASLprep <xref target="RFC5804" />;</t>
<t>The i;unicode-casemap Unicode Collation <xref target="RFC5051" />.</t>
</list>
</t>
<t>There turned out to be some difficulties with IDNA2003, documented
in <xref target="RFC4690" />. These difficulties led to a new IDN
specification, called IDNA2008 <xref target="RFC5890" />, <xref
target="RFC5891" />, <xref target="RFC5892" />, <xref target="RFC5893"
/>. Additional background and explanations of the decisions embodied
in IDNA2008 is presented in <xref target="RFC5894" />. One of the
effects of IDNA2008 is that Nameprep and Stringprep are not used at
all. Instead, an algorithm based on Unicode properties of codepoints
is defined. That algorithm generates a stable and complete table of
the supported Unicode codepoints. This algorithm is based on an
inclusion-based approach, instead of the exclusion-based approach of
Stringprep/Nameprep.
</t>
<t>This document lists the shortcomings and issues found by protocols
listed above that defined Stringprep profiles. It also lists some
early conclusions and requirements for a potential replacement of
Stringprep.</t>
</section>
<section title="Usage and Issues of Stringprep">
<section title="Issues raised during newprep BOF">
<t>During IETF 77, a BOF discussed the current state of the
protocols that have defined Stringprep profiles <xref
target="NEWPREP" />. The main conclusions are :
<list style="symbols">
<t>Stringprep is bound to a specific version of Unicode:
3.2. Stringprep has not been updated to new versions of
Unicode. Therefore, the protocols using Stringprep are stuck to
Unicode 3.2.</t>
<t>The protocols need to be updated to support new versions of
Unicode. The protocols would like to not be bound to a specific
version of Unicode, but rather have better Unicode agility in
the way of IDNA2008. This is important partly because it is
usually impossible for an application to require Unicode 3.2;
the application gets whatever version of Unicode is available on
the host.</t>
<t>The protocols require better bidirectional support (bidi) than
currently offered by Stringprep. </t>
<t>If the protocols are updated to use a new version of
Stringprep or another framework, then backward compatibility
is an important requirement. For example, Stringprep is
based on and may use NFKC <xref target="UAX15" />, while
IDNA2008 mostly uses NFC <xref target="UAX15" />.</t>
<t>Protocols use each other; for example, a protocol can use
user identifiers that are later passed to SASL, LDAP or another
authentication mechanism. Therefore, common set of rules or
classes of strings are preferred over specific rules for each
protocol.</t>
</list>
</t>
<t>Protocols that use Stringprep profiles use strings for
different purposes:
<list style="symbols">
<t>XMPP uses a different Stringprep profile for each part of the
XMPP address (JID): a localpart which is similar to a username and
used for authentication, a domainpart which is a domain name and a
resource part which is less restrictive than the localpart.</t>
<t>iSCSI uses a Stringprep profile for the IQN, which is
very similar to (often is) a DNS domain name. </t>
<t>SASL and LDAP uses a Stringprep profile for usernames.</t>
<t>LDAP uses a set of Stringprep profiles.</t>
</list>
</t>
<t>During the newprep BOF, it was the consensus of the
attendees that it would be highly preferable to have a
replacement of Stringprep, with similar characteristics to
IDNA2008. That replacement should be defined so that the
protocols could use internationalized strings without a lot of
specialized internationalization work, since
internationalization expertise is not available in the
respective protocols or working groups.</t>
</section>
<section title="Specific issues with particular Stringprep
profiles">
<t><cref source="ajs@shinkuro.com">This section is where
issues raised in the individual profile reviews goes. A
review of the WG trac state on 2010-10-06 of the tracker
suggests those reviews haven't happened yet.</cref></t>
</section>
<section title="Inclusion vs. exclusion of characters">
<t>One of the primary changes of IDNA2008 is in the way it
approaches Unicode characters. IDNA2003 created an explicit list
of excluded or mapped-away characters; anything in Unicode 3.2
that was not so listed could be assumed to be allowed under the
protocol. IDNA2008 begins instead from the assumption that
characters are disallowed, and then relies on Unicode properties
to derive whether a given character actually is allowed in the
protocol.</t>
<t>Moreover, there is more than one class of "allowed in the
protocol". While some characters are simply disallowed, some
are allowed only in certain contexts. The reasons for the
context-dependent rules have to do with the way some
characters are used. For instance, the ZERO WIDTH JOINER and
ZERO WIDTH NON-JOINER characters (ZWJ, U+200D and ZWNJ,
U+200C) are allowed with contextual rules because they are
required in some circumstances, yet are considered punctuation
by Unicode and would therefore be DISALLOWED under the usual
IDNA2008 derivation rules.</t>
<t>The working group needs to decide whether similar contextual
cases need to be supported.</t>
</section>
<section title="Stringprep and NFKC">
<t>Stringprep profiles may use normalization. If they do,
they use NFKC <xref target="UAX15" />. It is not clear that
NFKC is the right normalization to use in all cases. In <xref
target="UAX15" />, there is the following observation
regarding Normalization Forms KC and KD: "It is best to think
of these Normalization Forms as being like uppercase or
lowercase mappings: useful in certain contexts for identifying
core meanings, but also performing modifications to the text
that may not always be appropriate." For things like the
spelling of users' names, then, NKFC may not be the best form
to use. At the same time, one of the nice things about NFKC
is that it deals with the width of characters that are
otherwise similar, by canonicalizing half-width to full-width.
This mapping step can be crucial in practice. The WG will
need to analyze the different use profiles and consider
whether NFKC or NFC is a better normalization for each
profile.</t>
</section>
<section title="Case mapping">
<t>In IDNA2003, labels are always mapped to lower case before
the Punycode transformation. In IDNA2003, there is no mapping
at all: input is either a valid U-label or it is not. At the
same time, upper-case characters are by definition not valid
U-labels, because they fall into the Unstable category
(category B) of <xref target="RFC5892" />.</t>
<t>If there are protocols that require upper and lower cases be
preserved, then the analogy with IDNA2008 will break down. The
working group will need to decide whether there are any cases that
require upper case, and what to do about it if so.</t>
</section>
<section title="Whether to use ASCII-compatible encoding">
<t>The development of IDNA2008 depended on the notion that there
was a narrow repertoire of reasonable traditional labels, and
what was necessary was to internationalize that repertoire
rather than to incorporate any characters into domain name
labels. More exactly, the idea was to internationalize the
traditional hostname rules (the "LDH rule". See <xref
target="RFC4690" />, section 5.1.). Efforts to internationalize
email (<xref target="RFC5336" />) have started from different
assumptions. The email example suggests that in some cases, the
right answer might be to internationalize the target protocol
rather than to depend on a technology to ensure protocol slots
can use only ASCII. The working group will need to determine
which approach is correct for the different use-cases.</t>
</section>
<section title="Issues with delimiters">
<t>There are two kinds of issues to address with delimiters.
First, exactly where a delimiter will appear on the screen
when dealing with bidirectional parts of a string can be
extremely surprising. In the case of IDNA2008, just what to
do in these cases remains a display issue (there is no
question about the wire format, because the wire format is an
A-label and it is always left to right).</t>
<t>Second, there is the question of whether to include
different kinds of protocol separators. For instance, FULL
STOP, U+002E (.) may not be available on all keyboards. In
addition, in some languages there is more than one full stop
which are variants of one another. The working group will
need to decide how to handle such cases: whether there will be
a mapping, some restrictions, or something else.</t>
</section>
</section>
<section title="Considerations for Stringprep replacement">
<t>The above suggests the following direction for the working group:
<list style="symbols">
<t>A stringprep replacement should be defined.</t>
<t>The replacement should take an approach similar to IDNA2008,
in that it enables Unicode agility.</t>
<t>Protocols share similar characteristics of strings. Therefore,
defining i18n preparation algorithms for a (small) set of string
classes may be sufficient for most cases and provides the
coherence among a set of protocol friends.</t>
<t>The sets of string classes need to be evaluated for the
following properties:
<list>
<t>the normalization needed (NFC vs NFKC);</t>
<t>whether case-folding, case preservation, and
case-insensitive matching is needed;</t>
<t>what restrictions on input are reasonable for the class
(i.e. whether there is something like an "LDH rule" for the
class), or whether the ASCII-only input in the protocol slot
is lightly constrained;</t>
<t>the extent to which bidi considerations are important for
the class.</t>
</list>
</t>
</list>
</t>
<t>Existing deployments already depend on Stringprep profiles.
Therefore, the working group will need to consider the effects
of any new strategy on existing deployments. By way of
comparison, it is worth noting that some characters were
acceptable in IDNA labels under IDNA2003, but are not
protocol-valid under IDNA2008 (and conversely). Different
implementers may make different decisions about what to do in
such cases; this could have interoperability effects. The
working group will need to trade better support for different
linguistic environments against the potential side effects of
backward incompatibility.</t>
</section>
<section title="Security Considerations">
<t>This document merely states what problems are to be solved,
and does not define a protocol. There are undoubtedly security
implications of the particular results that will come from the
work to be completed.</t>
</section>
<section title="IANA Considerations">
<t>
This document has no actions for IANA.
</t>
</section>
<section title="Discussion home for this draft">
<t>
This document is intended to define the problem space
discussed on the precis@ietf.org mailing list.
</t>
</section>
</middle>
<back>
<references title="Informative References">
&bibxml2rfc-informative;
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 19:32:24 |