One document matched: draft-ietf-iri-bidi-guidelines-00.xml
<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc3490 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3490.xml">
<!ENTITY rfc3491 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3491.xml">
<!ENTITY rfc3987 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3987.xml">
]>
<?rfc strict='yes'?>
<?xml-stylesheet type='text/css' href='rfc2629.css' ?>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc symrefs='yes'?>
<?rfc sortrefs='yes'?>
<?rfc iprnotified="no" ?>
<?rfc toc='yes'?>
<?rfc compact='yes'?>
<?rfc subcompact='no'?>
<rfc ipr="pre5378Trust200902" docName="draft-ietf-iri-bidi-guidelines-00" category="bcp" xml:lang="en">
<front>
<title abbrev="Bidi IRI Guidelines">Guidelines for Internationalized Resource Identifiers with Bi-directional Characters (Bidi IRIs)</title>
<author initials="M.J." surname="Duerst" fullname='Martin Duerst'>
<!-- (Note: Please write "Duerst" with u-umlaut wherever
possible, for example as "Dürst" in XML and HTML) -->
<organization abbrev="Aoyama Gakuin University">Aoyama Gakuin University</organization>
<address>
<postal>
<street>5-10-1 Fuchinobe</street>
<city>Sagamihara</city>
<region>Kanagawa</region>
<code>229-8558</code>
<country>Japan</country>
</postal>
<phone>+81 42 759 6329</phone>
<facsimile>+81 42 759 6495</facsimile>
<email>duerst@it.aoyama.ac.jp</email>
<uri>http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/<!-- (Note: This is the percent-encoded form of an IRI)--></uri>
</address>
</author>
<author initials="L." surname="Masinter" fullname="Larry Masinter">
<organization>Adobe</organization>
<address>
<postal>
<street>345 Park Ave</street>
<city>San Jose</city>
<region>CA</region>
<code>95110</code>
<country>U.S.A.</country>
</postal>
<phone>+1-408-536-3024</phone>
<email>masinter@adobe.com</email>
<uri>http://larry.masinter.net</uri>
</address>
</author>
<date year="2011" month="August" day="14" />
<area>Applications</area>
<workgroup>Internationalized Resource Identifiers (iri)</workgroup>
<keyword>IRI</keyword>
<keyword>Internationalized Resource Identifier</keyword>
<keyword>BIDI</keyword>
<keyword>URI</keyword>
<keyword>URL</keyword>
<keyword>IDN</keyword>
<abstract>
<t>This specification gives guidelines for selection, use, presentation of
International Resource Identifiers (IRI) which include characters with
in inherent right-to-left (rtl) writing direction. </t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>Some UCS characters, such as those used in the Arabic and Hebrew
scripts, have an inherent right-to-left (rtl) writing direction. IRIs
containing these characters (called bidirectional IRIs or Bidi IRIs)
require additional attention because of the non-trivial relation
between logical representation (used for digital representation and
for reading/spelling) and visual representation (used for
display/printing).</t>
<t>Because of the complex interaction between the logical representation,
the visual representation, and the syntax of a Bidi IRI, a balance is
needed between various requirements.
The main requirements are<list style="hanging">
<t hangText="1.">user-predictable conversion between visual and
logical representation;</t>
<t hangText="2.">the ability to include a wide range of characters
in various parts of the IRI; and</t>
<t hangText="3.">minor or no changes or restrictions for
implementations.</t>
</list></t>
<section title="Notation">
<t>In this document, Bidi Notation is used for bidirectional examples: Lower case
letters stand for Latin letters or other letters that are written left
to right, whereas upper case letters represent Arabic or Hebrew
letters that are written right to left.</t>
<t> In this document, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" are to be interpreted as described in <xref
target="RFC2119"/>.</t>
</section> <!-- Notation -->
</section> <!-- Introduction -->
<section title="Logical Storage and Visual Presentation" anchor="visual">
<t>When stored or transmitted in digital representation, bidirectional
IRIs MUST be in full logical order and MUST conform to the IRI syntax
rules (which includes the rules relevant to their scheme). This
ensures that bidirectional IRIs can be processed in the same way as
other IRIs.</t> <t>Bidirectional IRIs MUST be rendered by using the
Unicode Bidirectional Algorithm <xref target="UNIV6"/>, <xref
target="UNI9"/>. Bidirectional IRIs MUST be rendered in the same way
as they would be if they were in a left-to-right embedding; i.e., as
if they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and
followed by U+202C, POP DIRECTIONAL FORMATTING (PDF). Setting the
embedding direction can also be done in a higher-level protocol (e.g.,
the dir='ltr' attribute in HTML).</t>
<t>There is no requirement to use the above embedding if the display
is still the same without the embedding. For example, a bidirectional
IRI in a text with left-to-right base directionality (such as used for
English or Cyrillic) that is preceded and followed by whitespace and
strong left-to-right characters does not need an embedding. Also, a
bidirectional relative IRI reference that only contains strong
right-to-left characters and weak characters and that starts and ends
with a strong right-to-left character and appears in a text with
right-to-left base directionality (such as used for Arabic or Hebrew)
and is preceded and followed by whitespace and strong characters does
not need an embedding.</t>
<t>In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM), may be
sufficient to force the correct display behavior. However, the
details of the Unicode Bidirectional algorithm are not always easy to
understand. Implementers are strongly advised to err on the side of
caution and to use embedding in all cases where they are not
completely sure that the display behavior is unaffected without the
embedding.</t>
<t>The Unicode Bidirectional Algorithm (<xref target="UNI9"/>, section
4.3) permits higher-level protocols to influence bidirectional
rendering. Such changes by higher-level protocols MUST NOT be used if
they change the rendering of IRIs.</t>
<t>The bidirectional formatting characters that may be used before or
after the IRI to ensure correct display are not themselves part of the
IRI. IRIs MUST NOT contain bidirectional formatting characters (LRM,
RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of
the IRI but do not appear themselves. It would therefore not be
possible to input an IRI with such characters correctly.</t>
</section> <!-- visual -->
<section title="Bidi IRI Structure" anchor="bidi-structure">
<t>The Unicode Bidirectional Algorithm is designed mainly for running
text. To make sure that it does not affect the rendering of
bidirectional IRIs too much, some restrictions on bidirectional IRIs
are necessary. These restrictions are given in terms of delimiters
(structural characters, mostly punctuation such as "@", ".", ":",
and<vspace/>"/") and components (usually consisting mostly of letters
and digits).</t>
<t>The following syntax rules from the ABNF of <xref target="RFC3987bis"/>
correspond to
components for the purpose of Bidi behavior: iuserinfo, ireg-name,
isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and
ifragment.</t>
<t>Specifications that define the syntax of any of the above
components MAY divide them further and define smaller parts to be
components according to this document. As an example, the restrictions
of <xref target="RFC3490"/> on bidirectional domain names correspond
to treating each label of a domain name as a component for schemes
with ireg-name as a domain name. Even where the components are not
defined formally, it may be helpful to think about some syntax in
terms of components and to apply the relevant restrictions. For
example, for the usual name/value syntax in query parts, it is
convenient to treat each name and each value as a component. As
another example, the extensions in a resource name can be treated as
separate components.</t>
<t>For each component, the following restrictions apply:</t>
<t>
<list style="hanging">
<t hangText="1.">A component SHOULD NOT use both right-to-left and
left-to-right characters.</t>
<t hangText="2.">A component using right-to-left characters SHOULD
start and end with right-to-left characters.</t>
</list></t>
<t>The above restrictions are given as "SHOULD"s, rather than as
"MUST"s. For IRIs that are never presented visually, they are not
relevant. However, for IRIs in general, they are very important to
ensure consistent conversion between visual presentation and logical
representation, in both directions.</t>
<t><list style="hanging">
<t hangText="Note:">In some components, the above restrictions may
actually be strictly enforced. For example, <xref
target="RFC3490"></xref> requires that these restrictions apply to
the labels of a host name for those schemes where ireg-name is a
host name. In some other components (for example, path components)
following these restrictions may not be too difficult. For other
components, such as parts of the query part, it may be very
difficult to enforce the restrictions because the values of query
parameters may be arbitrary character sequences.</t>
</list></t>
<t>If the above restrictions cannot be satisfied otherwise, the
affected component can always be mapped to URI notation using the
general percent-encoding of IRI components, as described
in <xref target="RFC3987bis"/>. Please note that the whole component
has to be mapped (see also Example 9 below).</t>
</section> <!-- bidi-structure -->
<section title="Input of Bidi IRIs" anchor="bidiInput">
<t>Bidi input methods MUST generate Bidi IRIs in logical order while
rendering them according to <xref target="visual"/>. During input,
rendering SHOULD be updated after every new character is input to
avoid end-user confusion.</t>
</section> <!-- bidiInput -->
<section title="Examples">
<t>This section gives examples of bidirectional IRIs, in Bidi
Notation. It shows legal IRIs with the relationship between logical
and visual representation and explains how certain phenomena in this
relationship may look strange to somebody not familiar with
bidirectional behavior, but familiar to users of Arabic and Hebrew. It
also shows what happens if the restrictions given in <xref
target="bidi-structure"/> are not followed. The examples below can be
seen at <xref target="BidiEx"/>, in Arabic, Hebrew, and Bidi Notation
variants.</t>
<t>To read the bidi text in the examples, read the visual
representation from left to right until you encounter a block of rtl
text. Read the rtl block (including slashes and other special
characters) from right to left, then continue at the next unread ltr
character.</t>
<t>Example 1: A single component with rtl characters is inverted:
<vspace/>Logical representation:
"http://ab.CDEFGH.ij/kl/mn/op.html"<vspace/>Visual representation:
"http://ab.HGFEDC.ij/kl/mn/op.html"<vspace/> Components can be read
one by one, and each component can be read in its natural
direction.</t>
<t>Example 2: More than one consecutive component with rtl characters
is inverted as a whole: <vspace/>Logical representation:
"http://ab.CDE.FGH/ij/kl/mn/op.html"<vspace/>Visual representation:
"http://ab.HGF.EDC/ij/kl/mn/op.html"<vspace/> A sequence of rtl
components is read rtl, in the same way as a sequence of rtl words is
read rtl in a bidi text.</t>
<t>Example 3: All components of an IRI (except for the scheme) are
rtl. All rtl components are inverted overall: <vspace/>Logical
representation:
"http://AB.CD.EF/GH/IJ/KL?MN=OP;QR=ST#UV"<vspace/>Visual
representation: "http://VU#TS=RQ;PO=NM?LK/JI/HG/FE.DC.BA"<vspace/> The
whole IRI (except the scheme) is read rtl. Delimiters between rtl
components stay between the respective components; delimiters between
ltr and rtl components don't move.</t>
<t>Example 4: Each of several sequences of rtl components is inverted
on its own: <vspace/>Logical representation:
"http://AB.CD.ef/gh/IJ/KL.html"<vspace/>Visual representation:
"http://DC.BA.ef/gh/LK/JI.html"<vspace/> Each sequence of rtl
components is read rtl, in the same way as each sequence of rtl words
in an ltr text is read rtl.</t>
<t>Example 5: Example 2, applied to components of different kinds:
<vspace/>Logical representation: "http://ab.cd.EF/GH/ij/kl.html"
<vspace/>Visual representation:
"http://ab.cd.HG/FE/ij/kl.html"<vspace/> The inversion of the domain
name label and the path component may be unexpected, but it is
consistent with other bidi behavior. For reassurance that the domain
component really is "ab.cd.EF", it may be helpful to read aloud the
visual representation following the bidi algorithm. After
"http://ab.cd." one reads the RTL block "E-F-slash-G-H", which
corresponds to the logical representation.
</t>
<t>Example 6: Same as Example 5, with more rtl components:
<vspace/>Logical representation:
"http://ab.CD.EF/GH/IJ/kl.html"<vspace/>Visual representation:
"http://ab.JI/HG/FE.DC/kl.html"<vspace/> The inversion of the domain
name labels and the path components may be easier to identify because
the delimiters also move.</t>
<t>Example 7: A single rtl component includes digits: <vspace/>Logical
representation: "http://ab.CDE123FGH.ij/kl/mn/op.html"<vspace/>Visual
representation: "http://ab.HGF123EDC.ij/kl/mn/op.html"<vspace/>
Numbers are written ltr in all cases but are treated as an additional
embedding inside a run of rtl characters. This is completely
consistent with usual bidirectional text.</t>
<t>Example 8 (not allowed): Numbers are at the start or end of an rtl
component:<vspace/>Logical representation:
"http://ab.cd.ef/GH1/2IJ/KL.html"<vspace/>Visual representation:
"http://ab.cd.ef/LK/JI1/2HG.html"<vspace/> The sequence "1/2" is
interpreted by the bidi algorithm as a fraction, fragmenting the
components and leading to confusion. There are other characters that
are interpreted in a special way close to numbers; in particular, "+",
"-", "#", "$", "%", ",", ".", and ":".</t>
<t>Example 9 (not allowed): The numbers in the previous example are
percent-encoded: <vspace/>Logical representation:
"http://ab.cd.ef/GH%31/%32IJ/KL.html",<vspace/>Visual representation:
"http://ab.cd.ef/LK/JI%32/%31HG.html"</t>
<t>Example 10 (allowed but not recommended): <vspace/>Logical
representation: "http://ab.CDEFGH.123/kl/mn/op.html"<vspace/>Visual
representation: "http://ab.123.HGFEDC/kl/mn/op.html"<vspace/>
Components consisting of only numbers are allowed (it would be rather
difficult to prohibit them), but these may interact with adjacent RTL
components in ways that are not easy to predict.</t>
<t>Example 11 (allowed but not recommended): <vspace/>Logical
representation: "http://ab.CDEFGH.123ij/kl/mn/op.html"<vspace/>Visual
representation: "http://ab.123.HGFEDCij/kl/mn/op.html"<vspace/>
Components consisting of numbers and left-to-right characters are
allowed, but these may interact with adjacent RTL components in ways
that are not easy to predict.</t>
</section><!-- examples -->
<section title="IANA Considerations" anchor="iana">
<t>This document makes no changes to IANA registries.</t>
</section> <!-- IANA -->
<section title="Security Considerations" anchor="security">
<t>Confusion can occur with bidirectional IRIs, if the restrictions
in <xref target="bidi-structure"/> are not followed. The same visual
representation may be interpreted as different logical representations,
and vice versa. It is also very important that a correct Unicode bidirectional
implementation be used.</t>
</section><!-- security -->
<section title="Acknowledgements">
<t>This document was derived from <xref target="RFC3987"/> and
<xref target="RFC3987bis"/> and the acknowledgments of those
documents apply.</t>
</section><!-- acknowledgements -->
</middle>
<back>
<references title="Normative References">
<reference anchor="RFC3987bis"
target="http://tools.ietf.org/id/draft-ietf-iri-3987bis">
<front>
<title>Internationalized Resource Identifiers (IRIs)</title>
<author initials="M." surname="Duerst"/>
<author initials="L." surname="Masinter" fullname="Larry Masinter"/>
<author initials="M." surname="Suignard"/>
<date year="2011" month="August" day="14"/>
</front>
</reference>
<reference anchor="ASCII">
<front>
<title>Coded Character Set -- 7-bit American Standard Code for Information
Interchange</title>
<author>
<organization>American National Standards Institute</organization>
</author>
<date year="1986"/>
</front>
<seriesInfo name="ANSI" value="X3.4"/>
</reference>
<reference anchor="ISO10646">
<front>
<title>ISO/IEC 10646:2003: Information Technology -
Universal Multiple-Octet Coded Character Set (UCS)</title>
<author>
<organization>International Organization for Standardization</organization>
</author>
<date month="December" year="2003"/>
</front>
<seriesInfo name="ISO" value="Standard 10646"/>
</reference>
&rfc2119;
&rfc3490;
&rfc3491;
<reference anchor="UNIV6">
<front>
<title>The Unicode Standard, Version 6.0.0 (Mountain View, CA, The Unicode Consortium, 2011, ISBN 978-1-936213-01-6)</title>
<author><organization>The Unicode Consortium</organization></author>
<date year="2010" month="October"/>
</front>
</reference>
<reference anchor="UNI9" target="http://www.unicode.org/reports/tr9/tr9-13.html">
<front>
<title>The Bidirectional Algorithm</title>
<author initials="M." surname="Davis" fullname="Mark Davis"><organization/></author>
<date year="2004" month="March"/>
</front>
<seriesInfo name="Unicode Standard Annex" value="#9"/>
</reference>
</references>
<references title="Informative References">
<reference anchor="BidiEx" target="http://www.w3.org/International/iri-edit/BidiExamples">
<front>
<title>Examples of bidirectional IRIs</title>
<author><organization/></author>
<date year="" month=""/>
</front>
</reference>
&rfc3987;
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 01:52:20 |