One document matched: draft-ietf-iri-comparison-02.xml


<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2045 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2045.xml">
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc2130 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2130.xml">
<!ENTITY rfc2616 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2616.xml">
<!ENTITY rfc3490 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3490.xml">
<!ENTITY rfc3491 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3491.xml">
<!ENTITY rfc3629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml">
<!ENTITY rfc3986 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3986.xml">
<!ENTITY rfc3987 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3987.xml">
<!ENTITY rfc5890 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5890.xml">
]>
<?rfc strict='yes'?>

<?xml-stylesheet type='text/css' href='rfc2629.css' ?>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc symrefs='yes'?>
<?rfc sortrefs='yes'?>
<?rfc iprnotified="no" ?>
<?rfc toc='yes'?>
<?rfc compact='yes'?>
<?rfc subcompact='no'?>
<rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-comparison-02" category="std" xml:lang="en" updates="3986">
  <front>
    <title abbrev="IRI Comparison">Comparison, Equivalence and Canonicalization of Internationalized Resource Identifiers</title>
    
    <author initials="L." surname="Masinter" fullname="Larry Masinter">
      <organization>Adobe</organization>
      <address>
	<postal>
	  <street>345 Park Ave</street>
	  <city>San Jose</city>
	  <region>CA</region>
	  <code>95110</code>
	  <country>U.S.A.</country>
	</postal>
	<phone>+1-408-536-3024</phone>
	<email>masinter@adobe.com</email>
	<uri>http://larry.masinter.net</uri>
      </address>
    </author>
    
    <author initials="M.J." surname="Duerst" fullname='Martin Duerst'>
      <!-- (Note: Please write "Duerst" with u-umlaut wherever
	   possible, for example as "Dürst" in XML and HTML) -->
      <organization abbrev="Aoyama Gakuin University">Aoyama Gakuin University</organization>
      <address>
	<postal>
	  <street>5-10-1 Fuchinobe</street>
	  <city>Sagamihara</city>
	  <region>Kanagawa</region>
	  <code>229-8558</code>
	  <country>Japan</country>
	</postal>
	<phone>+81 42 759 6329</phone>
	<facsimile>+81 42 759 6495</facsimile>
	<email>duerst@it.aoyama.ac.jp</email>
	<uri>http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/<!-- (Note: This is the percent-encoded form of an IRI)--></uri>
      </address>
    </author>
    
    
    <date year="2012" />
    <area>Applications</area>
    <workgroup>Internationalized Resource Identifiers (iri)</workgroup>
    <keyword>IRI</keyword>
    <keyword>URL</keyword>
    <keyword>Internationalized Resource Identifier</keyword>
    <keyword>UTF-8</keyword>
    <keyword>URI</keyword>
    <keyword>URL</keyword>
    <keyword>IDN</keyword>
    <keyword>Equivalence</keyword>
    <keyword>Normalization</keyword>
    <keyword>Canonicalization</keyword>
    <abstract>

      <t>Internationalized Resource Identifiers (IRIs) are Unicode
      strings used to identify resources on the Internet. Applications
      that use IRIs often define a means of comparing IRIs to
      determine when two IRIs are equivalent for the purpose of that
      application. Some applications also define a method for
      canonicalizing an IRI -- translating one IRI into another which
      is equivalent under the comparison method used.</t>

      
      <t>This document gives guidelines and best practices for
      defining and using IRI comparison and canonicalization
      methods.</t>
      
      <t>Comparison methods are used to determine equivalence. As URIs
      are a subset of IRIs, the guidelines apply to URI comparison as
      well.</t>
      
    </abstract>

  </front>
  <middle>
    
    <section title="Introduction">
      
      <t>Internationalized Resource Identifiers (IRIs) are Unicode strings
      used to identify resources on the Internet. Applications that use IRIs
      often define a means of comparing IRIs to determine when two IRIs are
      equivalent for the purpose of that application. Some applications also
      define a method for canonicalizing an IRI -- translating one IRI into
      another which is equivalent under the comparison method used.</t>

      <t>This document gives guidelines and best practices for defining and
      using IRI comparison and canonicalization methods.</t>

      <t>As every URI is also an IRI, the comparison and canonicalization
      methods also apply to URIs.</t>

      <t>IRI comparison is expected to determine whether two IRIs are
      equivalent without using the IRIs to access their respective
      resource(s).  For example, comparisons are performed whenever a
      response cache is accessed, a browser checks its history to color a
      link, or an XML parser processes tags within a namespace. </t>

      <t>Comparison for equivalence is often accomplished by
      canonicalization: (sometimes called normalization): a process for
      converting data that has more than one possible representation into a
      "standard", "normal", or "canonical" form.  Extensive canonicalization
      prior to comparison of IRIs may be used by spiders and indexing
      engines to prune a search space or reduce duplication of request
      actions and response storage.</t>

      <t>IRI comparison is performed for some particular purpose. Protocols
      or implementations that compare IRIs for different purposes will often
      be subject to differing design trade-offs in regards to how much
      effort should be spent in reducing aliased identifiers. This document
      describes various methods that may be used to compare IRIs, the
      trade-offs between them, and the types of applications that might use
      them.</t>


    </section> <!-- Introduction -->

    <section title="General guidelines">

      <t>Because IRIs exist to identify resources, one might expect two IRIs
      to be considered equivalent when they identify the same
      resource. However, this definition of equivalence is not of much
      practical use, as there is in general no way for an implementation to
      compare two resources to determine if they are "the same" unless it
      has full knowledge or control of them. Comparison methods for IRIs are
      generally based strictly on examining the characters that make up the
      IRI, without performing any network access.</t>

      <t>We use the terms "different" and "equivalent" to describe the
      possible outcomes of such comparisons, but there are many
      application-dependent versions of equivalence.</t>

      <t>Even when it is possible to determine that two IRIs are equivalent,
      IRI comparison is not sufficient to determine whether two IRIs
      identify different resources. For example, an owner of two different
      domain names could decide to serve the same resource from both,
      resulting in two different IRIs. For this reason, false negatives
      (e.g., returning "different" even with the resources are "the same")
      cannot be completely avoided. Comparison methods often try to minimize
      false negatives while strictly avoiding false positives.  However, in
      some cases (such as cache invalidation), false negatives are more
      harmful than false positives.</t>

      <t>A comparison method for determining equivalence might have
      multiple values, for example, returning "equivalent", "different",
      or "equivalence cannot be determined". </t>

      <t>Multiple canonicalization (normalizations) methods might be
      defined, where sequential application of each results in greater
      sets of equivalent values. </t>

      <t>In testing for equivalence, applications should not directly
      compare relative references; the references should be converted to
      their respective target IRIs before comparison. [[ref 3987bis]]</t>

      <t>Some IRIs contain fragment identifiers. In general, the equivalence
      of two IRIs is determined first by comparing the IRIs without any
      fragment identifiers, and then (if appropriate) the fragment
      components (if any) compared.</t>

      <t>Some applications (such as XML namespaces) use IRIs as identity
      tokens without any relationship to acessing the resources. Those
      applications use the Simple String Comparison (see <xref
      target="stringcomp"></xref>).</t>

    </section>

    <section title="Preparation for Comparison">

      <t>Any kind of IRI comparison REQUIRES that any additional contextual
      processing is first performed, including undoing higher-level
      escapings or encodings in the protocol or format that carries an
      IRI. This preprocessing is usually done when the protocol or format is
      parsed.</t>

      <t>NOTE: This document has not yet been updated to use in-line
      Unicode examples.</t>

      <t>Examples of such escapings or encodings are entities and
      numeric character references in <xref target="HTML4"></xref> and
      <xref target="XML1"></xref>. As an example,
      "http://example.org/ros&eacute;" (in HTML),
      "http://example.org/ros&#233;" (in HTML or XML), and
      <vspace/>"http://example.org/ros&#xE9;" (in HTML or XML) are
      all resolved into what is denoted in this document (see
      'Notation' section of <xref target="RFC3987bis" />) as
      "http://example.org/ros&#xE9;" (the "&#xE9;" here
      standing for the actual e-acute character, to compensate for the
      fact that this document cannot contain non-ASCII
      characters).</t>

      <t>An IRI is a sequence of Unicode characters. IRIs are sometimes
      represented in documents as sequences of bytes in a charset, either
      Unicode-based (UTF-8) or using some other character encoding
      (e.g., ISO-8859-1). Before comparing two such sequences, they
      must both be converted into sequences of Unicode characters.</t>

      <t>Similarly, encodings such as Transfer Codings in HTTP (see <xref
      target="RFC2616"></xref>) and Content Transfer Encodings in MIME
      (<xref target="RFC2045"></xref>) must be unencoded.  In these cases,
      the encoding is based not on characters but on octets, and additional
      care is required to make sure that characters, and not just arbitrary
      octets, are compared (see <xref target="stringcomp"></xref>.</t>

    </section> <!-- preparation -->

    <section title="Comparison Hierarchy" anchor="hierarchy">

      <t>In practice, a variety of methods are used to test IRI
      equivalence. These methods generally fall into a range
      distinguished by the amount of processing required and the
      degree to which the probability of false negatives is
      reduced. As noted above, false negatives cannot be
      eliminated. In practice, their probability can be reduced, but
      this reduction requires more processing and is not
      cost-effective for all applications.</t>

      <t>The following discussion starts with comparison methods that
      are cheap but have a relatively higher chance of producing false
      negatives, and proceeding to those that have higher
      computational cost and lower risk of false negatives.</t>

      <section title="Simple String Comparison" anchor="stringcomp">

	<t>If two IRIs (when considered as strings of Unicode
	characters) are identical, then it is safe to conclude that
	they are equivalent.  This type of equivalence test has very
	low computational cost and is in wide use in a variety of
	applications, particularly in the domain of parsing. It is
	also used when a definitive answer to the question of IRI
	equivalence is needed that is independent of the scheme used
	and that can be calculated quickly and without accessing a
	network. An example of such a case is XML Namespaces (<xref
	target="XMLNamespace"></xref>).</t>

	<t>Testing strings for equivalence requires some basic
	precautions.  This procedure is often referred to as
	"bit-for-bit" or "byte-for-byte" comparison, which is
	potentially misleading. Testing strings for equality is
	normally based on pair comparison of the characters that make
	up the strings, starting from the first and proceeding until
	both strings are exhausted and all characters are found to be
	equal, until a pair of characters compares unequal, or until
	one of the strings is exhausted before the other.</t>

	<t>This character comparison requires that each pair of
	characters be put in comparable encoding form. For example,
	should one IRI be stored in a byte array in UTF-8 encoding
	form and the second in a UTF-16 encoding form, bit-for-bit
	comparisons applied naively will produce errors. It is better
	to speak of equality on a character-for-character rather than
	on a byte-for-byte or bit-for-bit basis.  In practical terms,
	character-by-character comparisons should be done codepoint by
	codepoint after conversion to a common character encoding
	form.

	When comparing character by character, the comparison function
	MUST NOT map IRIs to URIs, because such a mapping would create
	additional spurious equivalences. It follows that an IRI
	SHOULD NOT be modified when being transported if there is any
	chance that this IRI might be used in a context that uses
	Simple String Comparison.</t>

	<t>False negatives are caused by the production and use of IRI
	aliases. Unnecessary aliases can be reduced, regardless of the
	comparison method, by consistently providing IRI references in
	a canonical form (after canonicalization is applied).</t>

	<t>Protocols and data formats might limit some IRI comparisons
	to simple string comparison, based on the theory that people
	and implementations will, in their own best interest, be
	consistent in providing IRI references, or at least be
	consistent enough to negate any efficiency that might be
	obtained from further canonicalization.</t>
      </section> <!-- stringcomp -->

      <section title="Syntax-Based Equivalence">
	<figure><preamble>Implementations may use logic based on the
	definitions provided by this specification to reduce the probability
	of false negatives. This processing is moderately higher in cost than
	character-for-character string comparison. For example, an application
	using this approach could reasonably consider the following two IRIs
	equivalent:</preamble>

	<artwork>
	  example://a/b/c/%7Bfoo%7D/ros&#xE9;
	  eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9
	</artwork></figure>

	<t>Web user agents, such as browsers, typically apply this
	type of IRI equivalence when determining whether a cached
	response is available. Syntax-based equivalence includes
	such techniques as case equivalence, Unicode character
	normalization, percent-encoding equivalence, and removal of
	dot-segments.</t>

	<section title="Case Equivalence">

	  <t>For all IRIs, the hexadecimal digits within a
	  percent-encoding triplet (e.g., "%3a" versus "%3A") are
	  case-insensitive and therefore should be considered
	  equivalent to forms which use uppercase letters for the
	  digits A-F.</t>

	  <t>When an IRI uses components of the generic syntax, the
	  component syntax equivalence rules always apply; namely,
	  that the scheme and US-ASCII only host are case insensitive
	  and therefore should be treated equivalent to lowercase. For
	  example, the URI "HTTP://www.EXAMPLE.com/" is equivalent to
	  "http://www.example.com/". Case equivalence for non-ASCII
	  characters in IRI components that are IDNs are discussed in
	  <xref target="schemecomp"></xref>.  The other generic syntax
	  components are assumed to be case sensitive unless
	  specifically defined otherwise by the scheme.</t>

	  <t>Creating schemes that allow case-insensitive syntax
	  components containing non-ASCII characters should be
	  avoided. Case equivalence of non-ASCII characters can be
	  culturally dependent and is always a complex operation. The
	  only exception concerns non-ASCII host names for which the
	  character normalization includes a mapping step derived from
	  case folding.</t>

	</section> <!-- casenorm -->

	<section title="Unicode Character Normalization" anchor="normalization">

	  <t>The Unicode Standard <xref target="UNIV6"></xref> defines
	  various equivalences between sequences of characters for
	  various purposes. Unicode Standard Annex #15 <xref
	  target="UTR15"></xref> defines various Normalization Forms
	  for these equivalences, in particular Normalization Form C
	  (NFC, Canonical Decomposition, followed by Canonical
	  Composition) and Normalization Form KC (NFKC, Compatibility
	  Decomposition, followed by Canonical Composition).</t>

	  <t>IRIs already in Unicode MUST NOT be normalized before
	  parsing or interpreting. In many non-Unicode character
	  encodings, some text cannot be represented directly. For
	  example, the word "Vietnam" is natively written
	  "Vi&#x1EC7;t Nam" (containing a LATIN SMALL LETTER E
	  WITH CIRCUMFLEX AND DOT BELOW) in NFC, but a direct
	  transcoding from the windows-1258 character encoding leads
	  to "Vi&#xEA;&#x323;t Nam" (containing a LATIN SMALL
	  LETTER E WITH CIRCUMFLEX followed by a COMBINING DOT
	  BELOW). Direct transcoding of other 8-bit encodings of
	  Vietnamese may lead to other representations.</t>

	  <t>Equivalence of IRIs MUST rely on the assumption that IRIs
	  are appropriately pre-character-normalized rather than apply
	  character normalization when comparing two IRIs. The
	  exceptions are conversion from a non-digital form, and
	  conversion from a non-UCS-based character encoding to a
	  UCS-based character encoding. In these cases, NFC or a
	  normalizing transcoder using NFC MUST be used for
	  interoperability. To avoid false negatives and problems with
	  transcoding, IRIs SHOULD be created by using NFC. Using NFKC
	  may avoid even more problems; for example, by choosing
	  half-width Latin letters instead of full-width ones, and
	  full-width instead of half-width Katakana.</t>


	  <t>As an example,
	  "http://www.example.org/r&#xE9;sum&#xE9;.html" (in
	  XML Notation) is in NFC. On the other hand,
	  "http://www.example.org/re&#x301;sume&#x301;.html"
	  is not in NFC.</t>

	  <t>The former uses precombined e-acute characters, and the
	  latter uses "e" characters followed by combining acute
	  accents. Both usages are defined as canonically equivalent
	  in <xref target="UNIV6"></xref>.</t>

	  <t><list style="hanging">

	    <t hangText="Note:">
	      Because it is unknown how a particular sequence of
	      characters is being treated with respect to character
	      normalization, it would be inappropriate to allow third
	      parties to normalize an IRI arbitrarily. This does not
	      contradict the recommendation that when a resource is
	      created, its IRI should be as character normalized as
	      possible (i.e., NFC or even NFKC). This is similar to
	      the uppercase/lowercase problems.  Some parts of a URI
	      are case insensitive (for example, the domain name). For
	      others, it is unclear whether they are case sensitive,
	      case insensitive, or something in between (e.g., case
	      sensitive, but with a multiple choice selection if the
	      wrong case is used, instead of a direct negative
	      result).  The best recipe is that the creator use a
	      reasonable capitalization and, when transferring the
	      URI, capitalization never be changed.</t></list></t>

	  <t>Various IRI schemes may allow the usage of
	  Internationalized Domain Names (IDN) <xref
	  target="RFC5890"/> either in the ireg-name part or
	  elsewhere. Character Normalization also applies to IDNs, as
	  discussed in <xref target="schemecomp"/>.</t>
	</section> <!-- charnorm -->

	<section title="Percent-Encoding Equivalence">

	  <t>The percent-encoding mechanism (Section 2.1 of <xref
	  target="RFC3986"></xref>) is a frequent source of variance
	  among otherwise identical IRIs. In addition to the case
	  equivalence issue noted above, some IRI producers
	  percent-encode octets that do not require percent-encoding,
	  resulting in IRIs that are equivalent to their nonencoded
	  counterparts. These IRIs should be compared by first decoding
	  any percent-encoded octet sequence that corresponds to an
	  unreserved character, as described in section 2.3 of <xref
	  target="RFC3986"></xref>.</t>

	  <t>For actual resolution, differences in percent-encoding
	  (except for the percent-encoding of reserved characters)
	  SHOULD always result in the same resource.  For example,
	  "http://example.org/~user", "http://example.org/%7euser",
	  and "http://example.org/%7Euser", SHOULD resolve to the same
	  resource.</t>

	  <t>If this kind of equivalence is to be tested, the
	  percent-encoding of both IRIs to be compared first needs to
	  be aligned; for example, by converting both IRIs to URIs,
	  eliminating escape differences in the resulting URIs, and
	  making sure that the case of the hexadecimal characters in
	  the percent-encoding is always the same (preferably upper
	  case). If the IRI is to be passed to another application or
	  used further in some other way, its original form MUST be
	  preserved.  The conversion described here should be
	  performed only for local comparison.</t>

	</section> <!-- pctnorm -->

	<section title="Path Segment Equivalence">

	  <t>The complete path segments "." and ".." are intended only
	  for use within relative references (Section 4.1 of <xref
	  target="RFC3986"></xref>) and are removed as part of the
	  reference resolution process (Section 5.2 of <xref
	  target="RFC3986"></xref>).  However, some implementations
	  may incorrectly assume that reference resolution is not
	  necessary when the reference is already an IRI, and thus
	  fail to remove dot-segments when they occur in non-relative
	  paths.  IRI comparison SHOULD remove dot-segments by
	  applying the remove_dot_segments algorithm to the path, as
	  described in Section 5.2.4 of <xref
	  target="RFC3986"></xref>.</t>

	</section> <!-- pathnorm -->
      </section> <!-- ladder -->

      <section title="Scheme-Based Comparison" anchor="schemecomp">

	<t>The syntax and semantics of IRIs vary from scheme to
	scheme, as described by the defining specification for each
	scheme. Implementations may use scheme-specific rules, at
	further processing cost, to reduce the probability of false
	negatives. For example, because the "http" scheme makes use of
	an authority component, has a default port of "80", and
	defines an empty path to be equivalent to "/", the following
	four IRIs are equivalent:</t>

	<figure><artwork>
	  http://example.com
	  http://example.com/
	  http://example.com:/
	http://example.com:80/</artwork></figure>

	<t>In general, an IRI that uses the generic syntax for
	authority with an empty path should be equivalent to a path of
	"/". Likewise, an explicit ":port", for which the port is
	empty or the default for the scheme, is equivalent to one
	where the port and its ":" delimiter are elided.</t>

	<t>Another case where equivalence varies by scheme is in the
	handling of an empty authority component or empty host
	subcomponent. For many scheme specifications, an empty authority or
	host is considered an error; for others, it is considered equivalent
	to "localhost" or the end-user's host.</t>

	<t>The presence of a missing component vs. one with an empty
	string component in an IRI SHOULD NOT be treated as equivalent
	unless explicitly defined as such by the scheme definition.
	For example, the IRI "http://example.com/?" cannot be assumed
	to be equivalent to any of the examples above; an empty query
	component is NOT equivalent to a missing one. Likewise, the
	presence or absence of delimiters within a userinfo
	subcomponent is usually significant to its interpretation.
	The fragment component is not subject to any scheme-based
	equivalence; thus, two IRIs that differ only by the suffix
	"#" are considered different regardless of the scheme.</t>
	
	<t>Some IRI schemes allow the usage of Internationalized Domain
	Names (IDN) <xref target='RFC5890'></xref> either in their ireg-name
	part or elswhere. When in use in IRIs, those names SHOULD
	conform to the definition of U-Label in <xref
	target='RFC5890'></xref>. An IRI containing an invalid IDN cannot
	successfully be resolved. For legibility purposes, they
	SHOULD NOT be converted into ASCII Compatible Encoding (ACE).</t>

	<t>Scheme-based comparison may also consider IDN components
	and their conversions to punycode as equivalent. As an
	example, "http://r&#xE9;sum&#xE9;.example.org" may be
	considered equivalent to
	"http://xn--rsum-bpad.example.org".</t><t>Other
	scheme-specific equivalence rules are possible.</t>

      </section> <!-- schemenorm -->

      <section title="Protocol-Based Comparison">

	<t>Substantial effort to reduce the incidence of false
	negatives is often cost-effective for web
	spiders. Consequently, they implement even more aggressive
	techniques in IRI comparison. For example, if they observe
	that an IRI such as</t>

	<figure><artwork>
	http://example.com/data</artwork></figure>
	<t>redirects to an IRI differing only in the trailing slash</t>
	<figure><artwork> http://example.com/data/</artwork></figure>

	<t>they will likely regard the two as equivalent in the
	future.  This kind of technique is only appropriate when
	equivalence is clearly indicated by both the result of
	accessing the resources and the common conventions of their
	scheme's dereference algorithm (in this case, use of
	redirection by HTTP origin servers to avoid problems with
	relative references).</t>

      </section> <!-- protonorm -->
    </section> <!-- equivalence -->

    <section title="Security Considerations" anchor="security">
	
      <t>The primary security difficulty comes from applications
      choosing the wrong equivalence relationship, or two different
      parties disagreeing on equivalence. This is especially a problem
      when IRIs are used in security protocols.</t>

      <t>Besides the large character repertoire of Unicode, reasons
      for confusion include different forms of normalization and
      different normalization expectations, use of percent-encoding
      with various legacy encodings, and bidirectionality issues. See
      also <xref target='UTR36'/>.</t>

    </section><!-- security -->

    <section title="Acknowledgements">

      <t>This document was originally derived from <xref target="RFC3986"/>
      and <xref target="RFC3987"/>, based on text contributed by Tim
      Bray.</t>
    </section>

  </middle>

  <back>
    <references title="Normative References">

      <reference anchor="RFC3987bis" 
		 target="http://tools.ietf.org/id/draft-ietf-iri-3987bis">
	
	<front>
	  <title>Internationalized Resource Identifiers (IRIs)</title>
          <author initials="M." surname="Duerst"/>
	  <author initials="L." surname="Masinter" fullname="Larry Masinter"/>
          <author initials="M." surname="Suignard"/>
	  <date year="2012"/>
	</front>
      </reference>


      &rfc2119;
      &rfc3490;
      &rfc3491;
      &rfc3629;
      &rfc3986;
      &rfc5890;

      <reference anchor="UNIV6">
	<front>
	  <title>The Unicode Standard, Version 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, ISBN 978-1-936213-01-6)</title>
	  <author><organization>The Unicode Consortium</organization></author>
	  <date year="2010" month="October"/>
	</front>
      </reference>


      <reference anchor="UTR15" target="http://www.unicode.org/unicode/reports/tr15/tr15-23.html">
	<front>
	  <title>Unicode Normalization Forms</title>
	  <author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
	  <author initials="M.J." surname="Duerst" fullname="Martin Duerst"><organization/></author>
	  <date year="2008" month="March"/>
	</front>
	<seriesInfo name="Unicode Standard Annex" value="#15"/>
      </reference>

    </references>

    <references title="Informative References">

      <reference anchor="HTML4" target="http://www.w3.org/TR/html401/appendix/notes.html#h-B.2">
	<front>
	  <title>HTML 4.01 Specification</title>
	  <author initials="D." surname="Raggett" fullname="Dave Raggett"><organization/></author>
	  <author initials="A." surname="Le Hors" fullname="Arnaud Le Hors"><organization/></author>
	  <author initials="I." surname="Jacobs" fullname="Ian Jacobs"><organization/></author>
	  <date year="1999" month="December" day="24"/>
	</front>
	<seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
      </reference>

      &rfc2045;
      &rfc3987;
      &rfc2616;
      

      <reference anchor="UTR36" target="http://unicode.org/reports/tr36/">
	<front>
	  <title>Unicode Security Considerations</title>
	  <author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
	  <author initials="M." surname="Suignard" fullname="Michel Suignard"><organization/></author>
	  <date year="2010" month="August" day="4"/>
	</front>
	<seriesInfo name="Unicode Technical Report" value="#36"/>
      </reference>

      <reference anchor="XML1" target="http://www.w3.org/TR/REC-xml">
	<front>
	  <title>Extensible Markup Language (XML) 1.0 (Forth Edition)</title>
	  <author initials="T." surname="Bray" fullname="Tim Bray"><organization/></author>
	  <author initials="J." surname="Paoli" fullname="Jean Paoli"><organization/></author>
	  <author initials="C.M." surname="Sperberg-McQueen" fullname="C. M. Sperberg-McQueen">
	  <organization/></author>
	  <author initials="E." surname="Maler" fullname="Eve Maler"><organization/></author>
	  <author initials="F." surname="Yergeau" fullname="Francois Yergeau"><organization/></author>
	  <date day="16" month="August" year="2006"/>
	</front>
	<seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
      </reference>

      <reference anchor="XMLNamespace" target="http://www.w3.org/TR/REC-xml-names">
	<front>
	  <title>Namespaces in XML (Second Edition)</title>
	  <author initials="T." surname="Bray" fullname="Tim Bray"><organization/></author>
	  <author initials="D." surname="Hollander" fullname="Dave Hollander"><organization/></author>
	  <author initials="A." surname="Layman" fullname="Andrew Layman"><organization/></author>
	  <author initials="R." surname="Tobin" fullname="Richard Tobin"><organization></organization></author>
	  <date day="16" month="August" year="2006"/>
	</front>
	<seriesInfo name="World Wide Web Consortium" value="Recommendation"/>
      </reference>

    </references>

  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 05:27:27