One document matched: draft-mealling-human-friendly-identifier-arch-00.txt
INTERNET-DRAFT M.Mealling
Expires six months from June 1998 Network Solutions, Inc.
Intended category: Experimental
draft-mealling-human-friendly-identifier-arch-00.txt
An Architecture for Supporting Human Friendly Identifiers
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working documents
of the Internet Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and
may be updated, replaced, or obsoleted by other documents at any time. It
is inappropriate to use Internet-Drafts as reference material or to cite
them other than as work in progress.
To view the entire list of current Internet-Drafts, please check
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).
Abstract
This document describes an architecture that satisfies the
requirements for Human Friendly Identifiers as specified in [HFI-REQ].
Specifically it describes the URI scheme "go" as an HFI encoding
mechanism, a protocol for the resolution of HFIs, and a scalable and
open infrastructure for resolving those HFIs.
1. The Architecture
This architecture borrows heavily from DNS both in terms of local
servers holding data while the root holds only referrals and in
terms of its operational organization reflecting the current
direction of DNS root management toward registrars and registries.
There are five distinct parts of the architecture:
The Root -- Due to the flatness of the HFI space, this service
will be heavily loaded. Thus, the data served from the root
should be small. Like DNS, it should only contain referrals
to locally maintained servers. This can also be thought
of as a registry in the parlance of the current gTLD
debate.
Registrars -- There are two classes of these: qualified and
unqualified. Qualified registrars offer a guaranteed
level of service as applied to the data that is presented
by the service. This distinction between qualified and
unqualified data is presented to the client by ranking
responses so that hits from qualified
registrars are ranked higher than those from unqualified
registrars. Unqualified registrars can be any entity. This
allows anyone to write entities to the root. Qualified vs.
Unqualified is discussed in more detail below.
Content Servers -- Since referrals are the only entities kept
in the root, the actual data returned during resolution
is retrieved from a separate server. This server can
be maintained by a registrar or by the entity that
requests the HFI.
Local Server -- Much like DNS, these servers act
as caches and contain data for use only be the
local entity. They use the same protocol as the root
and act as a chained or referral basis depending on
their configuration.
Clients -- The entire reason for most systems, the clients
are the part that actually send queries and process
results.
+-----------------------------------------------+
| Root (Registry) |
| (HFI, referral, contexts) |
+-----------------------------------------------+
/|\/|\ R| /|\
+---------------+ | | e| |R
| Registrar |---Qualified Entry--+ | f| |o
+---------------+ | e| |o
| r| |t
+---------------+ | r| |Q
| End user |---Unqualified Entry---+ a| |u
+---------------+ l| |e
+-----------+ |r
+----------------+ | |y
| Content Server | | +-------------------------+
+----------------+ | | Enterprise Level Server |
| /|\ | +-------------------------+
C| R| | /|\
o| e| | |
n| q| Referral | +-------------------------+
t| u| +----------------+ | Department Level Server |
e| e| | +------------+------------+
n| s| | /|\ |Possible |
t| t| \|/ | |Local |
| +---------------+ Resolution Request | |Content |
+--->| Client |-------------------------+ +------------+
+---------------+
Figure 1.
Data Model
The data model used by the architecture is fairly simple. The root
only contains the actual identifier string, zero or more
discriminating contexts, and enough information to refer the client
to the host that contains the required data. Contexts are values
specified by the registrant that discriminate the particular HFI from
other HFIs with the same value. Potential contexts include geographic
region, topic area/industry segment, popularity, or unique identifier.
It has not been determined which contexts are required, if any.
The metadata that is returned to the client resides on Content Servers.
The referral to the client contains a host/port tuple that refers to
Content Server. The data maintained there is encoded in an RDF [1]
object that adheres to the RDF Schema specified in Appendix A. Since
RDF allows multiple schema, the local Content Server maintainer has the
ability to include community specific information within the returned
object. The client is only required to understand the schema in
Appendix A.
Match Semantics
The first match is done on the HFI itself. The user's query can specify
simple syntactic matches at this point. Since the HFI is in Unicode
there may be language specific matches that are possible. Unicode
specific match semantics are a topic of much discussion.
One 1 or more syntactic matches are made, the user supplied contexts
are matched with the result set. Due to the expected size and load on
the root, contexts should be thought of as simple scalar values.
For example, if geographic area is specified as a context then the
values should be normalized outside of the root. This allows
the root to do very simple and fast comparisons on normalized codes.
The root should not be required to support a GIS back-end in order to
understand geographic location.
Syntactic matches are matches based on the exact Unicode values
of the HFI strings. These include exact and substring where
appropriate. It is probably NOT possible to support soundex style
matches across such a large, multi-lingual dataset.
2. The "go" URI scheme
In order for an HFI to be used within the existing Internet and
WorldWideWeb infrastructure it must adhere to the syntax and
semantics of Uniform Resource Identifiers [RFC2396]. The HFI
requirement that it be short suggests an URI scheme that is
small but recognizable. Thus the scheme "go" is specified as
the default method of specifying an HFI.
The "go" scheme contains a single element which is the HFI
itself. Since the HFI is required to be internationalized the scheme
will need to be able to handle any language or character set.
This requirement suggests that UTF-8 encoded Unicode is appropriate.
When displayed to the user an HFI should not be shown in its
URI encoded form unless no other form is available. Instead an
HFI should be shown according to the localization rules of the
user.
As with URNs (and most URIs for that matter), the "go" scheme
is considered independent of its resolution method. While the
protocol for that resolution is specified in this document, the
reader should take care to realize that a "go" URI can and will
be resolved by other protocols.
Example:
Displayed Form Encoded Form
------------------------- ---------------------------------
go:Nike go:Nike
go:Network Solutions go:Network%20Solutions
go:Martin J. Duerst go:Martin%20J.%20D%C3%BCrst
NOTE: In the last example the limits of this ASCII document do not
allow for the correct representation of Martin's last name.
3. The HFI resolution protocols
This architecture has several client-server interactions of differing
flavors. The protocol between the qualified registrars and the
registry is almost out of scope since it is an operational issue
that may have its own policy and security issues. The query protocol
between the Client and the Local Servers should be identical to the
query protocol used with the root since there shouldn't be any
architectural difference between the two. The protocol between the
Client and the Content Server can be handled by any existing
retrieval protocol. HTTP immediately comes to mind as a very valid
Content Server protocol.
3.1 Client to Server Query Protocol (CSQP?)
For speed the protocol should be simple and small. For a low barrier
to adoption the protocol should not require a great deal of encoding.
To balance these the protocol will be UTF-8 encoded Unicode. The
interaction is simple: the client connects and issues a query after
which the server responds with 1 or more referrals. Since both the
query and responses are atomic, the protocol can use either TCP or
UDP as its transport. TCP uses a simple text based, line oriented
interaction while UDP uses a simple, TFTP [RFC1350] style, packet
reconstruction.
3.1.1 UDP Interaction
Specification of exact UDP interaction should go here. See
TFTP [RFC1350] for a good example of how it should be done.
3.1.2 TCP Interaction
Specification of exact TCP interaction should go here. This should
be fairly easy since its simply the UDP version without any
block numbers or acknowledgments.
3.1.3 The Query
/* Authors Temporary Comment: These formats are arbitrary and */
/* thus can (and probably should) be changed. */
The Query is made up of 3 elements: the query type, the HFI
and n contexts. They are specified as follows:
query = query-type " " hfi " " *(crlf context) crlf
query-type = "substring" / "exact" / 1*alphadigit
hfi = <"go" scheme URI>
contect = context-name ":" context-value crlf
context-name = 1*alphadigit
contet-value = 1*alphadigit
alphadigit = alpha / digit / "_" / "-"
alpha = "a".."z" / "A".."Z"
digit = "0".."9"
lf = <ASCII LF (linefeed)>
cr = <ASCII CR (carriage return)>
crlf = cr lf
Example:
substring go:Nike
location:us-ga-atlanta-lawrenceville
industry:28
This example shows a query for the HFI "Nike" in the city of
Lawrenceville where the entity is in the International
Trademark Class 28 ("Toys and sporting goods").
exact go:Network%20Solutions
location:us
industry:38
This example shows a query for the HFI "Network Solutions" in
the United States where the entity is in the International
Trademark Class 38 ("Communication services").
3.1.4 The Response
/* Authors Temporary Comment: These formats are arbitrary and */
/* thus can (and probably should) be changed. */
A response is a simple list of hits where each hit is a tuple of
the actual HFI that matched, the domain-name of the Content Server,
the port on which to contact that host, and a unique id that is used
by the Content Server to insure that the correct HFI is requested.
It is in the following format:
response = *hit
hit = hfi domain port unique-id crlf
hfi = <"go" scheme URI>
port = "0".."65535"
unique-id = 1*alphadigit
alphadigit = alpha / digit / "_" / "-"
alpha = "a".."z" / "A".."Z"
digit = "0".."9"
lf = <ASCII LF (linefeed)>
cr = <ASCII CR (carriage return)>
crlf = cr lf
Examples:
go:Network%20Solutions services.netsol.com 8080 01BDF839.D979BBA0@netsol.com
This example shows the HFI that matched ("Network Solutions"), the
host to be contacted (services.netsol.com), the port (8080) and the
unique-id (01BDF839.D979BBA0@netsol.com). The unique ID is to serve
as the identifier that is retrieved from the content server. This is
for cases where a Content Server maintains multiple objects that share
the same HFI.
3.2 The Content Retrieval Protocol
The protocol for retrieving the actual RDF object is HTTP. The
host is contacted on the given port and the path is requested.
The requested path "/hfi/<uid>" where <uid> is the unique-id
found in the referral. The response from the server should be
a text/xml or application/xml object that contains an RDF object
following the specification in Appendix A.
Example:
The user requests the HFI for "go:Network%20Solutions" and is
presented with the hit from the above example. The client then
connects to services.netsol.com on port 8080 and, using HTTP,
requests the resource "/hfi/01BDF839.D979BBA0%64netsol.com". The
response should be for either an application/xml or text/xml
resource that contains the RDF object.
All standard HTTP functions are valid.
4. Qualified vs Unqualified
The reasoning behind allowing non-registrars to write unqualified
entries to the root is to allow for the two communities that are
being targeted with HFIs: the business community and the end user.
Businesses desire an HFI that is of a higher quality and that have
a bit of uniqueness to them. In their case, trademark is extremely
important. The end user is simply looking for a cool identifier
for use by friends and online contacts. Uniqueness and trademark
status are unimportant whereas coolness and vanity are of utmost
importance. In order for the system to be used by both, there is
the need for the two types of entries to be disambiguated.
For example, the South Park cartoon character Cartman is an important
trademark for Comedy Central. At the same time, South Park's popularity
has caused many online game players to use Cartman as a nickname to
identify their online character. Both can use the identifier
go:Cartman without there being any confusion as to which one is
Comedy Central's official Cartman HFI. One additional feature is
that, since the root contains both, Comedy Central has a fairly
easy method for checking on infringers and, if so desired, could
discover unqualified entries that it wished to pursue infringement
litigation against.
7. Author Contact Information
Michael Mealling
Network Solutions
505 Huntmar Park Drive
Herndon, VA 22070
voice: (703) 742-0400
fax: (703) 742-9552
email: michaelm@rwhois.net
Appendix A -- XML DTD for Content
This is just an example. I'm sure it will end up being a bit more elaborate
than this.
<!-- HFI DTD -->
<!ELEMENT hfi-content (hfi | identifiers | address | contacts)>
<!ELEMENT hfi (#PCDATA)>
<!ELEMENT identifiers (homepage | urn)>
<!ELEMENT homepage (#PCDATA)>
<!ELEMENT urn (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT contacts (technical | adminstrative)* >
<!ELEMENT technical (#PCDATA)>
<!ELEMENT administrative (#PCDATA)>
| PAFTECH AB 2003-2026 | 2026-04-23 02:20:14 |