One document matched: draft-eastlake-cturi-00.txt
INTERNET-DRAFT Donald E. Eastlake 3rd
Motorola
Expires March 2001 September 2000
Mapping Between Content-Types and URIs
------- ------- ------- ----- --- ----
<draft-eastlake-cturi-00.txt>
Donald E. Eastlake 3rd
Status of This Document
Distribution of this document is unlimited. Comments should be sent
to the author.
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026. Internet-Drafts are
working documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
Multipurpose Internet Mail Extension (MIME) Content-Type headers and
Uniform Resource Identifiers (URIs) are both used, in different
contexts, to label entities. A mapping is specified such that the
union of their meaning can be expressed in either syntax.
D. Eastlake 3rd [Page 1]
INTERNET-DRAFT Mapping Between Content-Types & URIs
Table of Contents
Status of This Document....................................1
Abstract...................................................1
Table of Contents..........................................2
1. Introduction............................................3
1.1 Definitions and Conventions............................4
1.2 Overview of Remaining Sections.........................4
2. Simple Mapping..........................................5
2.1 Simple Mapping of Content-Type to URI..................5
2.1.1 The Basic Case.......................................5
2.1.2 More Complete Rules..................................6
2.2. Simple Mapping of URI to Content-Type, The Basic Case.6
2.3 Content-Type Mapping Special Case for Basic Closure....7
2.4 URI Mapping Special Case for Basic Closure.............8
3. Controlled Mapping......................................9
4. Troublesome Characters.................................10
5. IANA Considerations and Potential Conflicts............10
6. Security Considerations................................11
References................................................12
Author's Address..........................................13
Expiration and File Name..................................13
D. Eastlake 3rd [Page 2]
INTERNET-DRAFT Mapping Between Content-Types & URIs
1. Introduction
Both MIME types and URIs have come to be used for type labeling and
similar information.
The IETF Multipurpose Internet Mail Extensions (MIME) message body
standards have developed into a general tagging and bagging
mechanism. This mechanism has spread from SMTP mail to USENET, HTTP,
and other protocols. In MIME, the type of an object is given in a
"Content-Type" header line. [RFC 2045, 2046, 2048] Such a line
consists of a MIME type and optionally additional parameters. A MIME
type consists of a MIME top level type, a slash, and a MIME subtype.
The original Uniform Resource Locator (URL [RFC 1738]), used to point
to World Wide Web (WWW) resources, has meanwhile grown in the more
general Uniform Resource Identifier (URI [RFC 2396]). Increasingly
URIs are used as general labels for algorithms, XML namespaces, web
based protocol data types, etc. (In some of these label uses, URIs
are considered opaque while in other cases they are assumed to
reference something which explicates their meaning.)
In most protocol syntax cases where there are provisions for a "type
label", the label is restricted to the syntax of a URI or the syntax
of a Content-Type. In many such cases, it will sometimes be useful
to be able to express labels of the "other" syntax. That is, it may
be useful in a URI syntax slot to also be able to express a MIME type
or Content-Type and, conversely, it may be useful in a Content-Type
syntax slot to also be able to express a URI. This document specifies
how.
Note that a URI or Content-Type could get converted back and forth
multiple times between these two syntaxes. To stop this from
resulting in ever longer and more complex tags, a check is specified
so that if a coversion is of a previously converted syntax, the
prevous conversion is reversed, in so far as practical.
To improve the repeatability of the results from single or multiple
steps of syntax conversion, capitalization and puctuation
recommendations are made where tokens are case insensitive or
variable punctuation is allowed.
Finally, in cases where the default conversion does not provide for
sufficient control, optional elements are defined for inclusion in
URIs and Content-Types that provide substantial control conver the
mapping output.
D. Eastlake 3rd [Page 3]
INTERNET-DRAFT Mapping Between Content-Types & URIs
1.1 Definitions and Conventions
Concerning URIs, please note the following:
(1) In this document, the term URI is used to include URI
Reference. That is, it includes the case where an octothorpe
("#") followed by a fragment identifier is suffixed to a pure
URI.
(2) Only absolute URIs are mappable. Relative URIs, with just a
hierarchial part, are not included in URI as used in this
document. They must first be converted to absolute URIs as
described in [RFC 2396].
(3) For presentation purposes, URIs are shown inside angle
brackets ("<...>") but these angle brackets are not actually a
part of the URI.
Concerning Content-Types, please note the following:
Content-Type values are shown preceeded by "Content-Type: " and,
when long, they are ling folded as per [RFC 822]. This prefix
and line folder are for presentation and are not actually a part
of the Content-Type.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC 2119].
1.2 Overview of Remaining Sections
Sections 2 and 3 below give an explanation of the mapping sepcified
more or less in Engligh. The material is organized to start with the
simplest and most common rules and then add exceptions for special
cases and additional user control.
Section 4 lists characters that must be URI ("%") encoded when
mapping from a URI to a Content-Type.
Section 5 covers IANA Considerations and potential conflicts.
Section 6 give Security Considerations.
D. Eastlake 3rd [Page 4]
INTERNET-DRAFT Mapping Between Content-Types & URIs
2. Simple Mapping
This section describes simple mappings such that any MIME Type or
Content-Type can be mapped into a URI and any URI can be mapped into
a Content-Type. Other than checks for mutiple conversions, the
mapping is simple. It can produce only a special scheme URI for the
mapping of a Content-Type and only a special sub-type tree in the
"application" top level type for the mapping of a URI. Section 3
below describes additional features optionally allowing much greater
control over the result of the mapping.
2.1 Simple Mapping of Content-Type to URI
Section 2.1.1 below describes the most basic case of converting a
simple MIME type to a URI. Section 2.1.2 extends this to converting
a general Content-Type to a URI. Section 2.3 adds the check
necessary to recognize where the MIME type being coverted is of the
form indicating it was previous converted from a URI using basic
mapping and is being converted back.
2.1.1 The Basic Case
In the simplest case of a Content-Type consisting of just a MIME
type, create a URI with scheme "ContentType" and a scheme dependent
part consisting of the MIME type. For example
Content-Type: image/JPEG
simply converts to
<ContentType:image/jpeg>
White space is not allowed in URIs so it must be removed. Scheme
names (the part before the first ":" in a URI) are case insensitive
but for readability and repeatability, the capitalization
"ContentType" SHOULD be used. Similarly, MIME top level types and
subtypes (the fields before and after the "/" in a MIME type field,
respectively) are case insensitive but SHOULD be all lower cased when
mapped to the URI form.
Note: There is no "//" after the "ContentType:" scheme as used
herein. Such a "//" would imply a specific structuring of the
scheme dependent part appearing in the URI after the
"ContentType:" as defined in [RFC 2396]. Since that full
structuring is not used, "//" is not used. The meaning of URIs
starting with "ContentType://" is reserved for future definition.
D. Eastlake 3rd [Page 5]
INTERNET-DRAFT Mapping Between Content-Types & URIs
Note: "Content-Type", with hyphen, is syntactically allowed as a
scheme name. However, [RFC 2717] reserves embedded hyphens in
scheme names to indicate the prefix of an alternate tree of
scheme names, so ContentType is used.
2.1.2 More Complete Rules
A Content-Type header frequently includes more than just the
mandatory MIME type. It can also have type dependent parameters,
including private parameters, such as
Content-Type: text/plain; charset="us-ascii",
x-mac-type="54455854", x-mac-creator="4D4F5353"
Content-Type: image/tiff; application=faxbw
Content-Type parameters are mapped into a "query portion" suffix of
the URI in much the same way that HTML form fields [HTML] are. That
is, they are concatenated to the MIME type after a "?" and, if there
is more than one parameter, separated by "&". Thus the above
Content-Types would be mapped into the following:
<ContentType:text/plain?charset="us-ascii"&x-mac-type="54455854"&
x-mac-creator="4D4F5353">
<ContentType:image/tiff?application="faxbw">
Parameter values in the mapped URI MUST always enclosed in double
quotes ('"'). If the Content-Type has a trailing ";" but no
parameters, then "?" SHOULD NOT be added to the URI.
2.2. Simple Mapping of URI to Content-Type, The Basic Case
This section describes the basic case of mapping a URI to a Content-
Type. Section 2.4 adds the check to see if the URI appears to be the
result of a previous converion from a Content-Type and if so undoes
that conversion in so far as practical.
In the basic case, a URI maps to a Content-Type with a top level MIME
type of "application" a MIME sub-type in the "uri." tree. In
addition, any "query" parameters in the URI are mapped to Content-
Type parameters and if the URI ends with a fragment identifier, it is
mapped to the special Content-Type parameter "URI-Fragment". Any
special characters in the URI that might be troublesome (see section
4) are encoded by replacing them with a "%" followed by two hex
digits for the character code.
D. Eastlake 3rd [Page 6]
INTERNET-DRAFT Mapping Between Content-Types & URIs
Note: Current URI syntax permits scheme dependent parts in which "?"
does not indicate a query section however no such syntaxes have
been publicly defined.
Some examples of the basic case follow:
<http://example.com/tag42>
<mailto:U@example.net?subject="misc"&body="line1%0D%0Aline2">
<xyz://abc.test/def?h=ijk#lmn>
convert to
Content-Type: application/uri.http%3A%2F%2Fexample.com%2Ftag42
Content-Type: application/uri.mailto%3Aexample.net;
subject="misc", body="line1%250D%250Aline2"
Content-Type: application/uri.xyz%3A%2F%2Fabc.text%2Fdef;
h="ijk", URI-Fragment="lmn"
Content-Type parameters values extracted from the query portion of a
URI MUST be surrounded with double quotes ('"'). When URI encoding,
if the hex value has any letters (a-f) in it, they SHOULD be upper
cased.
[Is splitting off the Fragment worth it? The "#" and frament
identifier could just be included in the constructed "uri." subtype.
In fact the query stuff could also, eliminating the need for
Content-Type parameters... but I don't think query parameters sr
fragment identifiers in a URI constitute the same sort of type
information in most cases and would be more accesible to most
software as Content-Type parameters.]
2.3 Content-Type Mapping Special Case for Basic Closure
A URI may have been converted back to a Content-Type and get
converted back. To stop this from resulting in an ever more complex
syntax, a check MUST be made to see if the MIME subtype of a
Content-Type being converted is in the "uri." subtype tree (see
section 2.2 above). If so, the URI is computed from the subtype by
stripping the "uri." prefix and performing one level of undoing URI
encoding. (Note: The top level MIME type is ignored in this case.)
In addition, Content-Type parameters, if any, are added as a "query
portion" and a "URI-Fragment" parameter is added as a fragment.
D. Eastlake 3rd [Page 7]
INTERNET-DRAFT Mapping Between Content-Types & URIs
For example:
Content-Type: application/uri.mailto%3Auser%40host.example
Content-Type: application/uri.http%3A%2F%2Fx.test; foo="123",
bar="abcd"
Content-Type:
application/uri.http%3A%2F%2Fa%3Ab%40c.text%2Fx%2Fy;
URI-Fragment="z"
convert to
<mailto:user@host.example>
<http://x.test?foo="123"&bar="abcd">
<http://a:b@c.text/x/y#z>
Note: If a Content-Type or MIME Type is being written by a user and
they know that there is a URI which is a more natural expression
of the labeling desired, they can simply use an
"application/uri." MIME Type to start with.
2.4 URI Mapping Special Case for Basic Closure
It is desireable that an arbitrary Content-Type be recovered
semanticly intact when mapped to a URI and then that URI is mapped
back to a Content-Type. To achieve this, the following special case
is added to the simple case described in section 2.2 above.
If the URI scheme is "ContentType:", then the Content-Type is
computed from the remaining part of the URI (the "scheme specific
part"), by replacing the first question mark ("?") and all query
section ampersands ("&") with semi-colon space ("; "), and then
undoing one level of URI encoding, i.e., replacing percent sign ("%")
followed by two hex digits with the character having that hex value.
For example
<ContentType:model/vnd.example.longish.subtype.name>
<ContentType:text/plain?charset="US-ASCII"&x-obscure="value">
map to
Content-Type: model/vnd.example.longish.subtype.name
D. Eastlake 3rd [Page 8]
INTERNET-DRAFT Mapping Between Content-Types & URIs
Content-Type: text/plain; charset="US-ASCII", x-obscure="value"
Note: A URI produced by simple mapping from a normal Content-Type
will never have a fragment suffix.
Note: If a URI is being written by a user and they know that there is
a Content-Type which is a more natural expression of the labeling
desired, they can simply use a "ContentType:" scheme to start
with.
3. Controlled Mapping
[Is this controlled mapping stuff below too complex? Would it be
better to just have sections 2 and 3 above and drop controlled
conversion?]
As an additional feature, there may be cases where a URI is designed
knowing that it might be converted to a Content-Type and it is
desired to control the MIME type so that it would have a more
appropriate top level than "application" or a more appropriate
subtype than one in the "uri." tree. To accomplish this, a special
URI query part parameter "MIME-Type" is defined. If a URI is not of
scheme ContentType and this special parameter is found, then the MIME
type is set to the parameter value and the URI body (all of the URI
except "query" parameters and any fragment identifier) is preseved in
a "URI-body" Content-Type parameter.
Similarly, there may be cases where a Content-Type is designed
knowing that it might be converted to a URI and it is desired to
control the URI scheme and non-query scheme dependent parts so that
it is not necessary to have a scheme of "ContentType:" or scheme
dependent part calculated as indicated in section 2.1. To accomplish
this, a special Content-Type parameter "URI-body" is defined. If a
Content-Type does not have a MIME subtype in the "uri." tree and this
parameter is present, it controls the non-query portion of the URI
mapped to and the original MIME type is preserved in a URI query
parameter called "MIME-Type".
For example
Content-Type: application/xml; URI-Body="http://xml.example"
would map to
<http://xml.example?MIME-Type="application/xml">
and
D. Eastlake 3rd [Page 9]
INTERNET-DRAFT Mapping Between Content-Types & URIs
<mailto:joe@blow.test?MIME-Type="message/rfc822"#123>
would map to
Content-Type: message/rfc822; URI-Body="mailto:joe@blow.text",
URI-Fragment="123"
4. Troublesome Characters
Troublesome characters are defined as those not permitted in a token
in [RFC 2045] except double quote but including percent sign. That
is, any character code from 0 through 32 inclusive or charcter code
127 or any of "(", ")", "<", ">", "@", ",", ";", ":", "\", "/", "[",
"]", "?", "%", or "=" are troublesome characters.
5. IANA Considerations and Potential Conflicts
This document allocates and specifies the following:
(1) The "ContentType" URI scheme.
(2) The "uri." MIME subtype tree. Since this subtree is totally
delegated to the URI specification, there are no independent
publication or review requirements for it. Any valid URI can be
used after the "uri." in any MIME top level type, after
troublesome characters (see section 4) in the URI are % escaped.
(3) In the context of automatic URI to Content-Type type conversion,
a meaning is specified for the "MIME-Type" URI query section
parameter.
(4) In the context of automatic Content-Type to URI conversion, a
meaning is specified for the "URI-Body" and "URI-Fragment"
Content-Type parameters.
Because this document authoritatively specifies the "ContentType" URI
scheme and the "uri." MIME subtype tree, no conflict can arise due to
other uses of them.
However, there is no precident for the specifiction of Content-Type
parameters valid across all MIME types, such as URI-Body and URI-
Fragment, and in fact [RFC 2046] denies their possibility. Nor is
there any precident for the specification of a universal URI query
parameter such as MIME-Type. The probability that any different use
is currently being made or will in the future have to be made of
these seems low enough that it can be ignored. It is possible that
D. Eastlake 3rd [Page 10]
INTERNET-DRAFT Mapping Between Content-Types & URIs
some processing systems are sensitive to the presence of parameters
they do not understand and will indicate errors when presented with
controlled mapping URIs or Content-Types. However, Content-Type
parameters and URI query parameters are usually handled on receipt by
such mechanisms as storing the name-value pair in an associative
array or as "environment variables" and ignorning extra parameters.
In fact, Content-Type processors are required by [RFC 2046] to ignore
any parameters they do not understand and to ignore parameter order.
6. Security Considerations
In some sense, the security considerations for MIME and content types
[RFC 2046], URIs [RFC 2396], and for every individual MIME type and
URI scheme can apply. In addition, the deployment of mapping aware
software may enable the introduction into or transmission through
MIME or content type contexts of URI semantics, including possibly
dangerous action schemes such as "mailto", and the introduction into
or tramismission through URI contexts of MIME and content type
semantics, including possibly dangerous exeuctable data types or the
like. Finally, implementation of controlled mapping may enable a
malicious user, by adding one of the special parameters specified
herein, to cause a surprising change in the semantics of a URI or
Content-Type produced by the mapping from an apparently innocuous
Content-Type or URI.
D. Eastlake 3rd [Page 11]
INTERNET-DRAFT Mapping Between Content-Types & URIs
References
[HTML] - Dave Raggett, Arnaud Le Hors, Ian Jacobs, "HTML 4.01
Specifcation", <http://www.w3.org/TR/html4>, December 1999.
[RFC 822] - D. Crocker, "Standard for the format of ARPA Internet
text messages", Aug-13-1982.
[RFC 1738] - T. Berners-Lee, L. Masinter, M.McCahill, "Uniform
Resource Locators (URL)", December 1994.
[RFC 2045] - N. Freed & N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies",
November 1996.
[RFC 2046] - N. Freed & N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", November 1996.
[RFC 2048] - N. Freed, J. Klensin & J. Postel, "Multipurpose Internet
Mail Extensions (MIME) Part Four: Registration Procedures", November
1996.
[RFC 2119] - S. Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997.
[RFC 2396] - T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
Resource Identifiers (URI): Generic Syntax", August 1998.
[RFC 2717] - R. Petke, I. King, "Registration Procedures for URL
Scheme Names", November 1999.
[RFC 2718] - L. Masinter, H. Alvestrand, D. Zigmond, R. Petke,
"Guidelines for new URL Schemes", November 1999.
D. Eastlake 3rd [Page 12]
INTERNET-DRAFT Mapping Between Content-Types & URIs
Author's Address
Donald E. Eastlake 3rd
Motorola
140 Forest Avenue
Hudson, MA 01749 USA
Telephone: +1 508-261-5434 (w)
+1 978-562-2827 (h)
FAX: +1 508-261-4777 (w)
EMail: Donald.Eastlake@motorola.com
Expiration and File Name
This draft expires March 2001.
Its file name is draft-eastlake-cturi-00.txt.
D. Eastlake 3rd [Page 13]
| PAFTECH AB 2003-2026 | 2026-04-23 16:11:07 |