One document matched: draft-palme-text-html-01.txt

Differences from draft-palme-text-html-00.txt





The Text/HTML content type and the Content-Location MIME header

or

Sending HTML documents via MIME e-mail



Status of this Memo


This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups.  Note that other groups may also
distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
``work in progress.''

To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).

This memo provides information for the Internet community. This'
memo does not specify an Internet standard of any kind, since
this document is mainly a compilation of information taken from
other RFC-s. Distribution of this memo is unlimited.


Abstract

This memo specifies how to send HTML-formatted documents in Internet
mail. The memo particularly addresses the issue of handling of
hyperlinks in HTML documents referring to other body parts in the same
message. In order to do this, the memo introduces one new MIME content-
header with the name "Content-Location".





Palme                                                          [Page 1]
draft-palme-text-html-01.txt                              January 1996


Differences from Previous Version

The postscript (.ps) version of this draft shows the differences
between version 00 and 01 through underscoring and strikethru markup.

This document has been revised based on the discussions in the ietf-
types and mhtml mailing lists and in the BOF at the Dallas IETF meeting
in December 1995.

Use of the Content-Base header has been introduced. The "linking"
parameter has been removed and replaced with use of the Content-Base
parameter.

Use of the Content-Disposition header has been replaced with use of the
"Content-Base: FILE" och "Content-Location" headers.

Information on the new mailing list for further discussions of this
ietf draft has been added.

Syntax for embedding URI-s in MIME headers has been added, copied from
[URLBODY].

Security considerations for implementations using proxy servers has
been added.

A temporary annex on implementation has been added. This annex might be
removed in the final version of this standard.

Table of Contents

1.  Introduction
2.  Terminology
3. The Content-Location MIME Content Header
4.  Parameters for the Content-Type: Text/HTML
5.  Use of Relative URL-s in Text/HTML Contents
6.  Use of the Content-Type: Multipart/related
7.  Use of the Content-type: Multipart/alternative
8.  Combination of the Content-Types: Multipart/related and
    Multipart/alternative.
9.  Format of Links to Other Body Parts
     9.1  General Location-Method: Identical URI-s in Content-
     Location headers
     9.2  Filename-Method: Use of virtual File Names
     9.3  CID-method: Use of CID URL-s
     9.4  Recommended Choice of Method:
10. Indication of Method Used
11. Content-Disposition header
12. Sending forms in e-mail
13. Encoding Considerations
14. Security Considerations
15. Acknowledgements
16. References
17. Author's Address


Palme                                                          [Page 2]
draft-palme-text-html-01.txt                              January 1996

Further Discussion

Further discussion on this memo should be sent to the mailing list
mhtml@segate.sunet.se.

To subscribe to this list, send a message to
    listserv@segate.sunet.se
which contains the text
sub mhtml <your name (not your e-mail address)>

Archives of this list are available by anonymous ftp from
   ftp://segate.sunet.se
The archives are also available by e-mail. Send a message to
listserv@segate.sunet.se with the text "index mhtml" to get a
list of the archive files, and then a new message "get <file
name>" to retrieve the archive files.
or
get mhtml digest


1.  Introduction

The HTML format is a very common format for documents in the Internet,
and there is an obvious need to be able to send documents in this
format in e-mail [SMTP, RFC822]. The "text/html; version=2.0" media
type is defined in [HTML2]. This memo gives additional specifications
and advice on how to use the text/html media type as a Content-Type in
MIME [MIME1] e-mail messages.


2.  Terminology

Most of the terms used in this memo are defined in other RFC-s.
For example, URL is defined in [URL], URI, absolute URI, and relative
URI is defined in [HTML2].


3. The Content-Location MIME Content Header

An additional MIME heading field is defined with the name "Content-
Location". This header field can occur in any MIME message heading or
content heading. Its value can be an absolute or relative URI.

A relative URI in the Content-Location header is only allowed if there
is also a Content-Base header (as defined in [RELURL]) specifying the
base for the relative URI.

This header is used to indicate that the data sent under this heading
is also retrievable, in identical formal, through normal use of this
URI. Thus, the information sent in the message can be seen as a cached
version of the original data. This header is only permitted if the data
is actually retrievable through use of this URI.



Palme                                                          [Page 3]
draft-palme-text-html-01.txt                              January 1996


In practice, at present only those URI-s which are URL-s are used, but
it is anticipated that other forms of URI-s will in the
future be used. This heading is similar to the Location header as
defined in [HTTP].

The syntax for the new heading field is, using the syntax definition
tools from [RFC822]:

     content-location ::= "Content-Location:" URI-parameter

where URI is at present (November 1995) restricted to the syntax for
URL-s as defined in [URL]. This syntax will be widened when the
definition of the URI syntax becomes more stable. The URI must encoded
in a format which allows for splitting of long URI-s into more than one
line. This is done using the following syntax, copied from [URLBODY]:

  URL-parameter := <"> URL-word *(*LWSP-char URL-word) <">

  URL-word := token
                ; Must not exceed 40 characters in length

The syntax of an actual URL string is given in [URL]. URL
strings can be of any length and can contain arbitrary
character content. This presents problems when URLs are
embedded in MIME body part headers that are wrapped according
to RFC 822 rules. For this reason they are transformed into a
URL-parameter for inclusion in a message/external-body
content-type specification as follows:

 (1)   A check is made to make sure that all occurrences of
       SPACE, CTLs, double quotes, backslashes, and 8-bit
       characters in the URL string are already encoded using
       the URL encoding scheme specified in RFC 1738. Any
       unencoded occurrences of these characters must be
       encoded. Note that the result of this operation is
       nothing more than a different representation of the
       original URL.

 (2)   The resulting URL string is broken up into substrings
       of 40 characters or less.

 (3)   Each substring is placed in a URL-parameter string as a
       URL-word, separated by one or more spaces. Note that
       the enclosing quotes are always required since all URLs

       contain one or more colons, and colons are tspecial
       characters [RFC 1521].

Extraction of the URL string from the URL-parameter is even
simpler: The enclosing quotes and any linear whitespace are
removed and the remaining material is the URL string.

Note: This header is similar to the Location header defined in [HTTP].

Palme                                                          [Page 4]
draft-palme-text-html-01.txt                              January 1996


4.  Parameters for the Content-Type: Text/HTML

The optional "version" parameter for the Content-Type: Text/HTML
indicates the version of HTML used, with "2.0" as default value.


5.  Use of Relative URL-s in Text/HTML Contents

The use of relative URL-s in Content-Type: Text/HTML should never be
used except in one of the following three cases (in order of priority,
if more than one of them are present, the first-listed applies)

(a) There is a BASE element in the HTML document which resolves the
    relative URL into a non-relative URL.

(b) There is a Content-Base header (as defined in [RELURL]), giving
    the base to be used.

(c) There is a Content-Location of the Text/HTML which can then serve
    as the base.

6.  Use of the Content-Type: Multipart/related

A message can contain one or more Text/HTML body parts and also contain
as separate body parts, data, to which hyperlinks (as defined in
[HTML2]) in the Text/HTML body part refers.

Such embedded linked parts must, together with the Text/HTML body part,
be enclosed within a Multipart/Related body part as defined in [REL].

The root (as defined in [REL]) should then be of the Content-Type:
Text/HTML.

Such an embedded linked part can itself be a Multipart/related body
parts including its own linked objects.


7.  Use of the Content-type: Multipart/alternative

If the message is sent to recipients, all of which may not have mailers
capable of handling the Text/HTML content-type, then the Content-Type:
Multipart/Alternative [MIME1] can be used, for example with Content-
Type: Text/plain as the first choice, and Content-Type: Text/HTML as
the second choice.










Palme                                                          [Page 5]
draft-palme-text-html-01.txt                              January 1996


8.  Combination of the Content-Types: Multipart/related and
Multipart/alternative.

Both the Content-type: Multipart/related, as defined in chapter 6 above
and the Content-Type: Multipart/alternative, as defined in chapter 7
above can be combined in the same message. It is then recommended to
put the Multipart/alternative inside the Multipart/related. Note that
if this is done, a start parameter to the Content-Type: Multipart/
related is necessary, as shown by the example below.

Example:

   Content-Type: Multipart/related; boundary="boundary-example-1";
                 type=Text/HTML; start=content-id-example@example.host

      --boundary-example 1
      Content-Type: MULTIPART/ALTERNATIVE
      Boundary: boundary-example-2

         --boundary-example-2
         Content-Type: Text/plain

         ... plain text version of the document for recipients
         whose mailers cannot handle Text/HTML ...

         --boundary-example-2
         Content-Type: Text/HTML
         Content-ID: content-id-example@example.host

         ... text of the HTML document ...

         --boundary-example-2--
      --boundary-example-1
      Content-Type: Image/GIF

      ... a body part, to which the HTML document has a link  ...
      --boundary-example-1--


9.  Format of Links to Other Body Parts

A Text/HTML body part may contain hyperlinks to documents which
are included as other body parts in the same message and within the
same multipart/related content. Three ways to do this is specified in
this memo:









Palme                                                          [Page 6]
draft-palme-text-html-01.txt                              January 1996


9.1  General Location-Method: Identical URI-s in Content-Location
headers

With this method, All URI-s in the Text/HTML document SHOULD be
absolute URI-s as defined in [HTML2] or relative URI-s relative to a
surrounding Content-Base header. It SHOULD be possible to use these URI-
s to retrieve the referred document using the protocol defined for
retrieval of this particular URL scheme in [URL] (subject to access
control).

For each distinct URI in the Text/HTML document, which refers to data
which is sent in the same MIME message, there SHOULD be a separate body
part within the multipart/related part of the message containing this
data. Each such body part SHOULD contain a Content-Location heading
field, and the string in this field SHOULD be identical to the URI as
used in the Text/HTML document.

Note: By identical string is not meant equivalent URI-s but actually
identical URI strings.

The receiving mailer can then resolve the hyperlink either by using the
URI in the normal way, or by using the data in the body part whose
Content-Location contains the same URI.

Example with absolute URI-s:

   Content-Type: Multipart/related; boundary="boundary-example-1";
                 type=Text/HTML

      --boundary-example 1
      Content-Type: Text/HTML

      ... text of the HTML document, which might contain a hyperlink
      to the other body part, for example through a statement such as:
      <IMG SRC="http://www.dsv.su.se/images/DSV-logo.gif">

      --boundary-example-1
      Content-Type: Image/GIF
      Content-Location: "http://www.dsv.su.se/images/logo.gif"

      --boundary-example-1--

Example with relative URI-s:

   Content-Base: http://www.dsv.su.se
   Content-Type: Multipart/related; boundary="boundary-example-1";
                 type=Text/HTML

      --boundary-example 1
      Content-Type: Text/HTML




Palme                                                          [Page 7]
draft-palme-text-html-01.txt                              January 1996


      ... text of the HTML document, which might contain a hyperlink
      to the other body part, for example through a statement such as:
      <IMG SRC="/images/logo.gif">

      --boundary-example-1
      Content-Type: Image/GIF
      Content-Location: "/images/logo.gif"

      --boundary-example-1--


9.2  Filename-Method: Use of virtual File Names

This method is a special case of the Location-Method described in
section 9.1, but also differs in that it may be used even if the
enclosed parts are not retrievable from other places than the body
parts included in the message.

With this method, the hyperlink URIs to other body parts in the same
message in the Text/HTML document SHOULD have a very simple format.
This simple format is relative URL-s of the form

   relative-url ::= 1ALPHA 0#7ALPHADIGIT [ "." 1#3ALPHADIGIT ]

   ALPHADIGIT ::= ALPHA / DIGIT

i.e. 1-8 characters plus 0-3 extension characters, only using Ascii
letters and digits and beginning with a letter.

The choice of this simple format is to match permitted file name
formats in most operating systems in wide use today.

For each distinct URI in the Text/HTML document, which refers to data
which is sent in the same MIME message, there should be a separate body
part, within the same multi-part/related content in the message,
containing this data. Each such body part SHOULD contain a Content-
Location header. The string in this Content-Location header should be
identical to the relative URI as used in the Text/HTML document.

Note: This method does not require that the body parts are actually
stored in files in the recipient computer. The receiving mailer may
choose to implement this method by storing the individual body parts in
files with the virtual file name, or may choose other implementation
methods.

Example:

   Content-Base: "FILE:"
   Content-Type: Multipart/related; boundary="boundary-example-1";
                 type=Text/HTML

      --boundary-example 1
      Content-Type: Text/HTML

Palme                                                          [Page 8]
draft-palme-text-html-01.txt                              January 1996


      ... text of the HTML document, which might contain a hyperlink
      to the other body part, for example through a statement such as:
      <IMG SRC="logo.gif">

      --boundary-example-1
      Content-Type: Image/GIF
      Content-Location: "logo.gif"

      --boundary-example-1--


9.3  CID-method: Use of CID URL-s

With this method, the hyperlink URIs to other body parts in the same
message in the Text/HTML document SHOULD be CID (Content-ID) URL-s as
defined in [URL] and [MIDCID].

For each distinct URI in the Text/HTML document, which refers to data
which is sent in the same MIME message, there should be a separate body
part in the message containing this data. Each such body part SHOULD
have a Content-ID header [MIME1]. The value of this Content-ID header
should be identical to the CID as used in the Text/HTML document.

Example:

   Content-Type: Multipart/related; boundary="boundary-example-1";
                 type=Text/HTML

      --boundary-example 1
      Content-Type: Text/HTML

      ... text of the HTML document, which might contain a hyperlink
      to the other body part, for example through a statement such as:
      <IMG SRC="cid:sign-eng*jpalme@dsv.su.se">

      --boundary-example-1
      Content-Type: Image/GIF
      Content-ID: sign-eng*jpalme@dsv.su.se

      --boundary-example-1--

Note: Content-ID-s should be globally unique. It is not permitted to
make them unique only within this message or within this
multipart/related.


9.4  Recommended Choice of Method:

A Text/HTML content may always, in addition to the use the methods
described in this chapter of this memo, contain URI-s only resolvable
using the method defined for this particular URI scheme, and not
referring to any data in separate body parts of the same message.


Palme                                                          [Page 9]
draft-palme-text-html-01.txt                              January 1996


Method     Body part identifi    Recommendation
           cation method
------     -------------         --------------

Virtual    File name in Content- Recommended as the primary choice, to
file name  Location header       be used whenever possible.
method

General    Content-Location      Recommended if existing HTML
Content-   header                documents are to be sent unchanged,
Location                         but only if the referred-to
method                           document(s) are publicly available
                                 and retrievable using the scheme used
                                 in the URI.

CID method Content-ID header     For experimental use between
                                 consenting partners.


10. Indication of Method Used

Which of the methods above used is indicated by the value of the
surrounding Content-Base header:

Method                             Indicated by:
------                             ------------

Virtual file name method           Content-Base: FILE:
                                   as defined in [URL]

CID method                         Content-Base: CID
                                   as defined in [CID]

General Content-Location method    Any other Content-Base or no
                                   Content-Base specified


(??) Should LOCAL-FILE, as defined in [MIME2] be used instead of FILE
as defined in ? Or should something new, such as "LOCAL" or "VIRTUAL
FILE" be used to clarify that no real file storage is necessary?


11. Content-Disposition header

Information in the Content-Disposition header (as defined in [CONDISP])
on individual body parts within a multipart/related is ignored.
Receiving mailers which are not capable of handling the
multipart/related header, and which thus by default handles this header
as if it was multipart/mixed, can however make use of information in
the Content-Disposition header.



Palme                                                         [Page 10]
draft-palme-text-html-01.txt                              January 1996


12. Sending forms in e-mail

When an e-mail message contains an HTML form, then the default for
ACTION (as defined in [HTML2] section 8.1.1) should be replying by e-
mail to the From: or Reply-To address of the message containing the
form, and not, as specified in [HTML2], the base URI of the document.


13. Encoding Considerations

There are two recommended ways to encode 8-bit characters in Text/HTML
contents:

(1) Let the charset of the content part be iso-8859-1, and encode
    the content with the quoted-printable encoding method.

(2) Let the charset of the content part be us-ascii, and encode
    non-us-ascii characters in the text using the Data character
    encoding defined in [HTML2].

Both these encoding methods are permitted, and they can also be mixed
in the same document. Recipients must be capable of handling both
encoding alternatives. However, it is recommended that encoding method
(2) above is used when sending Text/HTML messages.

If only method (2) is used, the charset parameter should be "us-ascii".

If method (1), or a mixture of method (1) and method (2) is used, the
charset parameter should be "iso-8859-1".


14. Security Considerations

There is a potential security risk if the Content-Location: heads a
body part whose data is not identical to that retrievable using the URI
in the Content-Location. To reduce this risk, it might be unsuitable to
cache the data in such a way that the cached data can be used for
retrieval of this URL from other documents than those included in the
same message as the Content-Location header.

One way of implementing messages with linked body parts is to handle
the linked body parts in a combined mail and WWW proxy server. The mail
client is only given the start body part, which it turns over to a web
browser. This web browser requests the linked parts in the normal way,
but these requests are intercepted by the proxy server. If this method
is used, and if the combined server is used by more than one user, then
methods must be employed to ensure that body parts of a message to one
person is not retrievable by another person. Use of passwords (also
known as tickets or magic cookies) is one way of achieving this.





Palme                                                         [Page 11]
draft-palme-text-html-01.txt                              January 1996


15. Acknowledgements

Harald Tveit Alvestrand, Richard Baker, Al Gilman,  Roy Fielding, Keith
Moore, Ed Levinson, Al Gilman, Mark K. Joseph, Daniel LaLiberte, Valdis
Kletnieks, Larry Masinter and several other people have helped me with
preparing this memo. I alone take responsibility for any errors which
may still be in the memo.


16. References

Temporary note: This list contains some references to Internet drafts.
It is anticipated that these Internet drafts will become RFC-s before
this memo. The references will then in this memo be changed to refer to
the corresponding RFC instead.

Ref.         Author, title
---------    ---------------------------------------------------------

[CID]        E. Levinson: "Message/External-Body Content-ID Access
             Type", RFC 1873, December 1995.

[CONDISP]    R. Troost, S. Dorner: "Communicating Presentation
             Information in Internet Messages: The Content-Disposition
             Header", RFC 1806, June 1995.

[HOSTS]      R. Braden (editor): "Requirements for Internet Hosts --
             Application and Support", STD-3, RFC 1123, October 1989.

[HTTP]       T. Berners-Lee, R. Fielding, H. Frystyk: "Hypertext
             Transfer Protocol -- HTTP/1.0", <draft-ietf-http-v10-spec-
             04.txt>, April 1996.

[MIME1]      N. Borenstein & N. Freed: "MIME (Multipurpose Internet
             Mail Extensions) Part One: Mechanisms for Specifying and
             Describing the Format of Internet Message Bodies", RFC
             1521, Sept 1993.

[MIME2]      N. Borenstein & N. Freed: "Multipurpose Internet Mail
             Extensions (MIME) Part Two: Media Types". draft-ietf-
             822ext-mime-imt-02.txt, December 1995.
[NEWS]       M.R. Horton, R. Adams: "Standard for interchange of
             USENET messages", RFC 1036, December 1987.

[REL]        Harald Tveit Alvestrand, Edward Levinson: "The MIME
             Multipart/Related Content-type", <draft-levinson-
             multipart-related-00.txt>, January 1995.

[RELURL]     R. Fielding: "Relative Uniform Resource Locators", RFC
             1808, June 1995.

[RFC822]     D. Crocker: "Standard for the format of ARPA Internet
             text messages." STD 11, RFC 822, August 1982.

Palme                                                         [Page 12]
draft-palme-text-html-01.txt                              January 1996


[SMTP]       J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC
             821, August 1982.

[URL]        T. Berners-Lee, L. Masinter, M. McCahill: "Uniform
             Resource Locators (URL)", RFC 1738, December 1994.

[URLBODY]    N. Freed and Keith Moore: "Definition of the URL MIME
             External-Body Access-Type", draft-ietf-mailext-acc-url-
             01.txt, November 1995.

|HTML2]      T. Berners-Lee, D. Connolly: "Hypertext Markup Language -
             2.0", RFC 1866, November 1995.



17. Author's Address

Jacob Palme                          Phone: +46-8-16 16 67
Stockholm University and KTH         Fax: +46-8-783 08 29
Electrum 230                         E-mail: jpalme@dsv.su.se
S-164 40 Kista, Sweden


Annex A: Implementation methods
-------------------------------

This annex is not part of the standards and is only included for
informational purposes. This annex might be removed before making this
memo into an IETF standard.

This standard has been intentionally written to be implementable both
in cases where the web browser and e-mail program is combined, and when
they are separate programs. Implementation is of course no problem if
the web browser is combined with the e-mail client.

    +---------+                           +--------+
    | Web     |                           | Mail   |
    | browser |                           | client |
    +-------+-+                           +-+------+
            |                               |
         +--+-------------------------------+--+
         | +----------+  +--+  +--+            |
         | | Start    |  |  |  |  | Related    |
         | | HTML     |  |  |  |  | body part  |
         | | document |  |  |  |  | parts      |
         | +----------+  +--+  +--+            |
         +-------------------------------------+

If the web browser is separate from the e-mail client, the e-mail
client might turn over the HTML body part to the web browser and ask it
to display it. One way of doing this is to store the HTML body part in
a file, and ask the web browser to display this file. If
multipart/related is used, this can be implemented by storing all the

Palme                                                         [Page 13]
draft-palme-text-html-01.txt                              January 1996


body parts within the multipart/related in an otherwise empty
folder/directory. With the virtual file name method described in
section 9.2 above, this does not require any rewriting of the HTML text
and is thus easy to implement, that is why the virtual file name is
recommended as the primary method above.

    +---------+                          +--------+
    | Web     |                          | Mail   |
    | browser |                          | client |
    +-------+-+                          +-+------+
            |                              |
         +--+------------------------------+-+
         | +--------+  +--+  +--+            |
         | | Trans- |  |  |  |  | Related    |
         | | lation |  |  |  |  | body part  |
         | | table  |  |  |  |  | parts      |
         | +--------+  +--+  +--+            |
         +-----------------------------------+

With the general Content-Location methods, the web browser must in some
way be instructed to retrieve the body parts from the received message.
This can be done by a translation table, if the web browser has an API
which allows for such a table.

    +--------+       +-----------+       +--------+
    | Proxy  |       | Data base |       | Mail   |
    | web    |-------| of cached |-------| server |
    | server |       | objects   |       |        |
    +----+---+       +-----------+       +----+---+
         |                                    |
    +----+----+                          +----+---+
    | Web     |                          | Mail   |
    | browser |                          | client |
    +-------+-+                          +-+------+
            |                              |
         +--+------------------------------+-+
         |         Start HTML object         |
         +-----------------------------------+

Other methods are to rewrite the HTML text before turning it over to
the web browser, and to use a proxy web server, to which the web
browser requests are sent, and which will then use the cached body
parts instead of normal web retrieval from the network.











Palme                                                         [Page 14]




PAFTECH AB 2003-20262026-04-24 02:42:41