One document matched: draft-yoneya-precis-mappings-03.xml


<?xml version="1.0" encoding="us-ascii"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
  <!ENTITY rfc5895 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5895.xml">
  <!ENTITY rfc3454 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3454.xml">
  <!ENTITY rfc4518 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4518.xml">
  <!ENTITY rfc3722 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3722.xml">
  <!ENTITY rfc4013 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4013.xml">
  <!ENTITY rfc6120 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6122.xml">
  <!ENTITY rfc3748 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3748.xml">
  <!ENTITY rfc4314 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4314.xml">
  <!ENTITY rfc3490 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3490.xml">
  <!ENTITY rfc3491 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3491.xml">
  <!ENTITY I-D.ietf-precis-framework SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-precis-framework-03.xml">
  <!ENTITY I-D.ietf-precis-problem-statement SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-precis-problem-statement-06.xml">
]>

<rfc category="info" docName="draft-yoneya-precis-mappings-03" ipr="trust200902">
  <front>
    <title abbrev="precis mapping">Mapping characters for precis classes</title>

    <author fullname="Yoshiro YONEYA" initials="Y" surname="YONEYA">
      <organization>JPRS</organization>
      <address>
        <postal>
          <street>Chiyoda First Bldg. East 13F</street>
          <street>3-8-1 Nishi-Kanda</street>
          <city>Chiyoda-ku</city>
          <region>Tokyo</region>
          <code>101-0065</code>
          <country>Japan</country>
        </postal>
        <phone>+81 3 5215 8451</phone>
        <email>yoshiro.yoneya@jprs.co.jp</email>
      </address>
    </author>
      
    <author fullname="Takahiro NEMOTO" initials="T" surname="NEMOTO">
      <organization>Keio University</organization>
      <address>
        <postal>
          <street>Graduate School of Media Design</street>
          <street>4-1-1 Hiyoshi, Kohoku-ku</street>
          <city>Yokohama</city>
          <region>Kanagawa</region>
          <code>223-8526</code>
          <country>Japan</country>
        </postal>
        <phone>+81 45 564 2517</phone>
        <email>t.nemo10@kmd.keio.ac.jp</email>
      </address>
    </author>

    <date day="3" month="October" year="2012"/>

    <abstract>
      <t>
        Preparation and comparison of internationalized strings ("precis")
        framework <xref target="I-D.ietf-precis-framework"/> is defining
        several classes of strings for preparation and comparison.  In the
        document, case mapping is defined because many of protocols handle 
        case sensitive or case insensitive string comparison and therefore
        preparation of string is mandatory.  As described in IDNA mapping 
        <xref target="RFC5895"/> and precis problem statement 
        <xref target="I-D.ietf-precis-problem-statement"/>, mappings in 
        internationalized strings are not limited to case, but also width, 
        delimiters and/or other specials are taken into consideration.
        This document is a guideline for authors of protocol profiles of precis framework
        and describes the mappings that must be considered
        between receiving user input and passing permitted code points to
        internationalized protocols.
      </t>
    </abstract>

  </front>

  <middle>
    <section title="Introduction">
      <t>
        In many cases, user input of internationalized strings is generated
        by input method editor ("IME") or copy-and-paste from free text.
        Usually users do not care case and/or width of input characters
        because they are identical for users' eyes.  Further, users rarely
        switch IME state to input special characters such as protocol
        elements.  For Internationalized Domain Names ("IDNs"), IDNA Mapping
        <xref target="RFC5895"/> describes methods to treat these issues. 
        For precis strings, case mapping is defined as a process in precis
        framework <xref target="I-D.ietf-precis-framework"/>, but width
        mapping, delimiter mapping and/or special mapping are not defined.  Handling of
        mappings other than case is also important to increase chance of
        strings match as users expect.
        This document is a guideline for authors of protocol profiles of precis framework
        and describes the mappings that must be considered
        between receiving user input and passing permitted code points to
        internationalized protocols.
      </t>
    </section>
<!-- ******************************************************************************* -->
    <section title="Types of mapping" anchor="type">
      <t>
        This document defines two types of mapping. One is protocol independent mapping
        that doesn't depend on protocol rules and the other is protocol dependent mapping that depend on protocol rules.
        This document defines some mappings in these mapping types. 
        Authors of protocol profiles of precis framework should need to give careful consideration to choice of mappings.
      </t>
      <t>
        Each mapping type is described in following sections.
      </t>
    </section>
<!-- ******************************************************************************* -->
    <section title="Protocol independent mapping" anchor="pim">
      <t>
        Protocol independent mapping is a mapping that doesn't depend on protocol rules.
      </t>
      <section title="Width mapping">
        <t>
          Fullwidth and halfwidth characters (those defined with
          Decomposition Types <wide> and <narrow>) are mapped to
          their decomposition mappings as shown in the Unicode character
          database <xref target="Unicode"/>.
        </t>
        <t>
          Width mapping will increase backward compatibility with Stringprep
          <xref target="RFC3454"/> and precis framework 
          <xref target="I-D.ietf-precis-framework"/>.  Because in a
          Stringprep profile which specifies Unicode normalization form KC
          (NFKC) for normalization method, fullwidth/halfwidth characters
          are mapped into its compatible form.  If a precis framework
          profile specified NFKC (which is not recommended), width mapping
          might not be useful.
        </t>
      </section>
    </section>
<!-- ******************************************************************************* -->
    <section title="Protocol dependent mapping" anchor="pdm">
      <t>
        Protocol dependent mapping is a mapping that depend on protocol rules.
      </t>
      <section title="Delimiter mapping">
        <t>
          Definitions of delimiters in certain protocols are differ from
          each other.  Therefore, delimiter mapping table should be based on
          well defined mapping table for each protocol.
        </t>
        <t>
          One of the most useful case of delimiter mapping is when FULL STOP
          character (U+002E) is a delimiter as well as domain name.  Some of
          IME generates FULL STOP compatible characters such as IDEOGRAPHIC
          FULL STOP (U+3002) when users type FULL STOP on the keyboard.
        </t>
      </section>
      <section title="Special mapping" anchor="spm"> 
        <t>
          Certain protocols have characters which need to map different character from 
          precis framework defined mapping rule other than delimiter characters.
          In this document, these mappings are named special mapping. They are differ from each protocol.
          Therefore, special mapping table should be based on well defined mapping table for each protocol.
          Examples of special mapping are following;
          <list style='symbols'>
            <t>White spaces are mapped to SPACE (U+0020)</t>
            <t>Some characters such as control characters are mapped to nothing (Deletion)</t>
          </list>
          LDAPprep<xref target="RFC4518"/> defines the rule that some codepoints(<xref target="cplist4518"/>) are mapped to SPACE (U+0020). 
        </t>
      </section>
      <section title="Local case mapping" anchor="lcm">
        <t>
          Local case mapping is case folding that depend on language context.
          For example, given there is upper case I in a user ID strings, you
          should care what's language context that this user ID depend on when
          this character is mapped into lower case character.  And if this
          depends on Turkish, the character should be mapped into LATIN SMALL
          LETTER DOTLESS I (U+0131) as this character's lower case.
        </t>
        <t>
          This document defines characters that need local case mapping based on
          the Specialcasing.txt <xref target="Specialcasing"/> in section 3.13
          of The Unicode Standerd <xref target="Unicode"/> to solve such a problem.
          
          Local case mapping targets only characters that get two different results to perfom just casefolding
          that is defined in the Casefolding.txt <xref target="Casefolding"/> and perfom special casefolding
          that is defined in the Specialcasing.txt then casefolding, because precis framework have casefolding.
        </t>
        <t>
          There are two types casefoldings defined as Unconditional Mappings and Conditional Mappings in the Specialcasing.txt.
          Conditional mappings have Language-Insensitive Mappings that targets characters whose full case mappings do not depend on language, but do
          depend on context and Language-Sensitive Mappings that these are characters whose full case mappings depend on language and perhaps also
          context.
        </t>
        <t>
          Of these mappings, characters that Unconditional Mappings and Language-Insensitive Mappings in Conditional Mappings target are mapped into
          same codepoint(s) with just casefolding and special casefolding then casefolding.
          But characters that Language-Sensitive Mappings in Conditional Mappings targets are mapped into different codepoint with them.
          Therefore this document defined characters that are a part of characters of Lithuanian(lt), Turkish(tr) and Azerbaijanian(az) that Language-Sensitive Mappings targets
          as targets for local case mapping.
        </t>
        <t>
          A list of characters that need Local case mapping are as follows.
          <list>
            <t>Format:<vspace />
               <Language>; <Codepoint>; <Lowercase>; <Comments></t>
            <t>lt; 0049; 0069 0307; LATIN CAPITAL LETTER I<vspace />
               lt; 004A; 006A 0307; LATIN CAPITAL LETTER J<vspace />
               lt; 012E; 012F 0307; LATIN CAPITAL LETTER I WITH OGONEK<vspace />
               lt; 00CC; 0069 0307 0300; LATIN CAPITAL LETTER I WITH GRAVE<vspace />
               lt; 00CD; 0069 0307 0301; LATIN CAPITAL LETTER I WITH ACUTE<vspace />
               lt; 0128; 0069 0307 0303; LATIN CAPITAL LETTER I WITH TILDE<vspace />
               tr; 0130; 0069; LATIN CAPITAL LETTER I WITH DOT ABOVE<vspace />
               tr; 0049; 0131; LATIN CAPITAL LETTER I<vspace />
               az; 0130; 0069; LATIN CAPITAL LETTER I WITH DOT ABOVE<vspace />
               az; 0049; 0131; LATIN CAPITAL LETTER I</t>
          </list>
          <xref target="iana"/> "IANA Considerations" contains a template to registry these characters to IANA as precis local case mapping registry.
        </t>
      </section>
    </section>
<!-- ******************************************************************************* -->
    <section title="Applying order of mapping" anchor="aom">
      <t>
        Basically, applying order of mapping that this document describes aren't sensitive.
        This section defines applying order of mapping to minimize effect of codepoint change by mappings.
        This mapping order is very general and was designed to be acceptable to the widest user community.
        <list style='numbers'>
          <t>width mapping</t>
          <t>delimiter mapping</t>
          <t>special mapping</t>
          <t>local case mapping</t>
          <t>precis framework</t>
        </list>
        Mappings that this document describes should be performed before precis framework. 
      </t>
    </section>
<!-- ******************************************************************************* -->
    <section title="IANA Considerations" anchor="iana">
      <section title="precis local case mapping registry">
        <t>
          IANA is requested to create a registry of precis local case mapping.
          In accordance with [RFC5226], the registration policy is "RFC
          Required".
        </t>
      </section>
      <section title="Template for precis local case mapping registry">
        <t>
          The following information is to be given when a new precis local case mapping rule is created.
          The registration template is as follows:
          <list>
            <t>Language: language name</t>
            <t>Codepoint: Local case mapping that can be applied when this code point
              exists in the strings</t>
            <t>Local lowercase: The lowercase codepoint after performing local case mapping</t>
            <t>Comment: Character name of the code point</t>
          </list>
          <xref target="initialized"/> contains further discussion and a table from which that
          registry can be initialized.
        </t>
      </section>
    </section>

    <section title="Security Considerations">
      <t>
        TBD.
      </t>
    </section>
    <section title="Acknowledgment" anchor="ack">
      <t>
        Martin Dürst suggested a need for the case folding about the mapping(map final sigma to sigma, German sz to ss,.).
      </t>
      <t>
        Pete Resnick et al. gave important suggestion for this document during at WG meeting.
      </t>
    </section>
  </middle>

  <back>
    <references title="References">
      &rfc3454;
      &rfc3490;
      &rfc3491;
      &rfc3722;
      &rfc3748;
      &rfc4013;
      &rfc4314;
      &rfc4518;
      &rfc5895;
      &rfc6120;
      &I-D.ietf-precis-framework;
      &I-D.ietf-precis-problem-statement;
      <reference anchor="Unicode">
        <front>
          <title>The Unicode Standard, Version 6.1.0</title>
          <author initials="" surname="The Unicode Consortium" fullname="">
            <organization/>
          </author>
          <date year="2012"/>
        </front>
        <seriesInfo name="" value="<http://www.unicode.org/versions/Unicode6.1.0/>"/>
      </reference>
    <reference anchor="Casefolding">
      <front>
        <title>CaseFolding-6.1.0.txt</title>
        <author initials="" surname="" fullname="">
          <organization/>
        </author>
        <date year=""/>
      </front>
      <seriesInfo name="Unicode Character Database, July 2011," value="<http://www.unicode.org/Public/6.1.0/ucd/CaseFolding.txt>"/>
    </reference>
    <reference anchor="Specialcasing">
      <front>
        <title>SpecialCasing-6.1.0.txt</title>
        <author initials="" surname="" fullname="">
          <organization/>
        </author>
        <date year=""/>
      </front>
      <seriesInfo name="Unicode Character Database, July 2011," value="<http://www.unicode.org/Public/6.1.0/ucd/SpecialCasing.txt>"/>
    </reference>
    </references>
      <section title="Mapping type list each protocol" anchor="maplist">
          <section title="Mapping type list for each protocol" anchor="maptype">
          <t>
              This table is the mapping type list for each protocol.
              Values marked "o" indicate that the protocol use the type of mapping.
              Values marked "-" indicate that the protocol doesn't use the type of mapping.
          </t>
              <figure>
                  <artwork><![CDATA[
+----------------------+-------------+-----------+------+---------+
|    \ Type of mapping |    Width    | Delimiter | Case | Special |
| RFC \                |    (NFKC)   |           |      |         |
+----------------------+-------------+-----------+------+---------+
|         3490         |      -      |     o     |   -  |    -    |
|         3491         |      o      |     -     |   o  |    -    |
|         3722         |      o      |     -     |   o  |    -    |
|         3748         |      o      |     -     |   -  |    o    |
|         4013         |      o      |     -     |   -  |    o    |
|         4314         |      o      |     -     |   -  |    o    |
|         4518         |      o      |     -     |   o  |    o    |
|         6120         |      -      |     -     |   o  |    -    |
+----------------------+-------------+-----------+------+---------+
                  ]]></artwork>
              </figure>
          </section>
      </section>
      <section title="Codepoints which need special mapping" anchor="list_sp">
          <section title="RFC3748" anchor="cplist3748">
              <t>
                  Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to SPACE (U+0020).
              </t>
          </section>
          <section title="RFC4013" anchor="cplist4013">
              <t>
                  Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to SPACE (U+0020).
              </t>
          </section>
          <section title="RFC4314" anchor="cplist4314">
              <t>
                  Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to SPACE (U+0020).
              </t>
          </section>
          <section title="RFC4518" anchor="cplist4518">
              <t>
                  Codepoints mapped to SPACE (U+0020) are following;
              </t>
                      <t>
                          U+0009 (CHARACTER TABULATION)<vspace />
                          U+000A (LINE FEED (LF))<vspace />
                          U+000B (LINE TABULATION)<vspace />
                          U+000C (FORM FEED (FF))<vspace />
                          U+000D (CARRIAGE RETURN (CR))<vspace />
                          U+0085 (NEXT LINE (NEL))<vspace />
                          U+0020 (SPACE)<vspace />
                          U+00A0 (NO-BREAK SPACE)<vspace />
                          U+1680 (OGHAM SPACE MARK)<vspace />
                          U+2000 (EN QUAD)<vspace />
                          U+2001 (EM QUAD)<vspace />
                          U+2002 (EN SPACE)<vspace />
                          U+2003 (EM SPACE)<vspace />
                          U+2004 (THREE-PER-EM SPACE)<vspace />
                          U+2005 (FOUR-PER-EM SPACE)<vspace />
                          U+2006 (SIX-PER-EM SPACE)<vspace />
                          U+2007 (FIGURE SPACE)<vspace />
                          U+2008 (PUNCTUATION SPACE)<vspace />
                          U+2009 (THIN SPACE)<vspace />
                          U+200A (HAIR SPACE)<vspace />
                          U+2028 (Line Separator)<vspace />
                          U+2029 (Paragraph Separator)<vspace />
                          U+202F (NARROW NO-BREAK SPACE)<vspace />
                          U+205F (MEDIUM MATHEMATICAL SPACE)<vspace />
                          U+3000 (IDEOGRAPHIC SPACE)<vspace />
                      </t>
              <t>
                  All other control code (e.g., Cc) points or code points with a
                  control function (e.g., Cf) are mapped to nothing.
                  Codepoints mapped to nothing that aren't specified by Stringprep are following;
              </t>
              <t>
                  U+0000-0008<vspace />
                  U+000E-001F<vspace />
                  U+007F-0084<vspace />
                  U+0086-009F<vspace />
                  U+06DD<vspace />
                  U+070F<vspace />
                  U+180E<vspace />
                  U+200E-200F<vspace />
                  U+202A-202E<vspace />
                  U+2061-2063<vspace />
                  U+206A-206F<vspace />
                  U+FFF9-FFFB<vspace />
                  U+1D173-1D17A<vspace />
                  U+E0001<vspace />
                  U+E0020-E007F<vspace />
              </t>
          </section>
      </section>
      <section title="The initial precis local case mapping registrations" anchor="initialized">
        <section title="Lithuanian">
          <t>
            language: Lithuanian
            <list>
              <t>
                Codepoint: U+0049<vspace />
                Local lowercase: U+0069 U+0307<vspace />
                Comment: LATIN CAPITAL LETTER I
              </t>
              <t>
                Codepoint: U+004A<vspace />
                Local lowercase: U+006A U+0307<vspace />
                Comment: LATIN CAPITAL LETTER J
              </t>
              <t>
                Codepoint: U+012E<vspace />
                Local lowercase: U+012F U+0307<vspace />
                Comment: LATIN CAPITAL LETTER I WITH OGONEK
              </t>
              <t>
                Codepoint: U+00CC<vspace />
                Local lowercase: U+0069 U+0307 U+0300<vspace />
                Comment: LATIN CAPITAL LETTER I WITH GRAVE
              </t>
              <t>
                Codepoint: U+00CD<vspace />
                Local lowercase: U+0069 U+0307 U+0301<vspace />
                Comment: LATIN CAPITAL LETTER I WITH ACUTE
              </t>
              <t>
                Codepoint: U+0128<vspace />
                Local lowercase: U+0069 U+0307 U+0303<vspace />
                Comment: LATIN CAPITAL LETTER I WITH TILDE
              </t>
            </list>
          </t>
        </section>
        <section title="Turkish">
          <t>
            language: Turkish
            <list>
              <t>
                Codepoint: U+0130<vspace />
                Local lowercase: U+0069<vspace />
                Comment: LATIN CAPITAL LETTER I WITH DOT ABOVE
              </t>
              <t>
                Codepoint: U+0049<vspace />
                Local lowercase: U+0131<vspace />
                Comment: LATIN CAPITAL LETTER I
              </t>
            </list>
          </t>
        </section>
        <section title="Azerbaijanian">
          <t>
            language: Azerbaijanian
            <list>
              <t>
                Codepoint: U+0130<vspace />
                Local lowercase: U+0069<vspace />
                Comment: LATIN CAPITAL LETTER I WITH DOT ABOVE
              </t>
              <t>
                Codepoint: U+0049<vspace />
                Local lowercase: U+0131<vspace />
                Comment: LATIN CAPITAL LETTER I
              </t>
            </list>
          </t>
        </section>
      </section>
      <section title="Change Log" anchor="changes">
        <section title="Changes since -00">
          <t>
            <list style="symbols">
              <t>
                Add the Section 2.3 "Special mapping" in Section 2 Type of mappings.
              </t>
              <t>
                Add the topic about the special mapping and additional case mapping in Section 3 "Discussion".
              </t>
              <t>
                Add Appendices; <vspace />
                <xref target="maplist"/> "Mapping type list each protocols"<vspace />
                <xref target="list_sp"/> "Code point list is need special mapping"<vspace />
                <xref target="changes"/> "Change Log"<vspace />
              </t>
              <t>
                Add the <xref target="ack"/> "Acknowledgment".
              </t>
            </list>
          </t>
        </section>
        <section title="Changes since -01">
            <t>
              <list style="symbols">
                <t>
                  Modify document structure as a guideline for authors of protocol profiles of precis framework.
                </t>
                <t>
                  Group mappings that this document defines into two types.
                </t>
                <t>
                  Add the <xref target="aom"/> "Applying order of mapping".
                </t>
                <t>
                  Delete the section 3 "Discussion".
                </t>
              </list>
            </t>
          </section>
          <section title="Changes since -02">
            <t>
              <list style="symbols">
                <t>
                  Modify the <xref target="lcm"/> "Local case mapping" for defining characters that local case mapping targets.
                </t>
                <t>
                  Request creating registry of precis local case mapping to IANA and define a template for registry of precis local case mapping in the <xref target="iana"/> "IANA Considerations".
                </t>
                <t>
                  Add the <xref target="initialized"/> "The initial precis local case mapping registrations".
                </t>
              </list>
            </t>
          </section>
      </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 16:26:58