One document matched: draft-ietf-sieve-regex-01.xml


<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<rfc ipr="trust200902" docName="draft-ietf-sieve-regex-01.txt">

<?rfc toc="no" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc comments="yes" ?>

<front>
<title abbrev="Sieve Regex Extension">
Sieve Email Filtering: Regular Expression Extension
</title>

<author initials="K." surname="Murchison" fullname="Kenneth Murchison">
<organization>Carnegie Mellon University</organization>
<address>
<postal>
<street>5000 Forbes Avenue</street>
<street>Cyert Hall 285</street>
<city>Pittsburgh</city> <region>PA</region>
<code>15213</code> <country>US</country>
</postal>
<phone>+1 412 268 2638</phone>
<email>murch@andrew.cmu.edu</email>
</address>
</author>

<author initials="N." surname="Freed" fullname="Ned Freed">
<organization>Oracle Corporation</organization>
<address>
<postal>
<street>800 Royal Oaks</street>
<city>Monrovia</city> <region>CA</region>  <code>91016-6347</code>
<country>USA</country>
</postal>
<phone>+1 909 457 4293</phone>
<email>ned.freed@mrochek.com</email>
</address>
</author>

<date year="2010" />
<area>Applications</area>
<workgroup>Sieve Working Group</workgroup>

<keyword>RFC</keyword>
<keyword>Request for Comments</keyword>
<keyword>Sieve</keyword>
<keyword>I-D</keyword>
<keyword>Internet-Draft</keyword>
<keyword>Regex</keyword>

<abstract><t>
This document describes the "regex" extension to the Sieve email filtering
language. In some cases, it is desirable to have a string matching mechanism
which is more powerful than a simple exact match, a substring match or
a glob-style wildcard match.  The regular expression matching
mechanism defined in this draft provides users with much more powerful
string matching capabilities.
</t></abstract>

<note title="Change History (to be removed prior to publication as an RFC)">

<t>
Changes from draft-murchison-sieve-regex-08:
<list style="symbols">
<t>Updated to XML source.</t>
<t>Documented interaction with variables.</t>
</list></t>

<t>
Changes from draft-ietf-sieve-regex-00:
<list style="symbols">
<t>Various cleanup and updates.</t>
<t>Added trial text specifying comparator interactions.</t>
</list></t>

</note>

<note title="Open Issues (to be removed prior to publication as an RFC)">

<t><list style="symbols">

<t>The major open issue with this draft is what to do, if anything, about
localization/internationalization.  Are <xref target="IEEE.1003-2.1992"/>
collating sequences and character equivalents sufficient?  Should we
reference the Unicode technical specification?  Should we punt and publish
the document as experimental?
</t>

<t>
Is the current approach to comparator integration the right one to use?
</t>

<t>Should we allow shorthands such as \\b (word boundary) and \\w (word
character)? 
</t>

<t>Should we allow backreferences (useful for matching double words,
etc.)?
</t>

</list></t>
</note>

</front>

<middle>

<section title="Introduction" anchor="intro">

<t>
Sieve <xref target='RFC5228' /> is a language for filtering email
messages at or around the time of final delivery.  It is designed to be
implementable on either a mail client or mail server.
</t>

<t>
The Sieve base specification defines so-called match types for tests:
is, contains, and matches. An "is" test requires an exact match, a "contains"
test provides a substring match, and "matches" provides glob-style
wildcards. This document describes an extension to the Sieve language
that provides a new match type for regular expression comparisons.
</t>

</section> <!-- intro -->

<section title="Conventions used in this document" anchor="conventions">

<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref target="RFC2119"/>.
</t>

<t>
The terms used to describe the various components of the Sieve
language are taken from Section 1.1 of <xref target='RFC5228' />.
</t>

</section> <!-- conventions -->

<section title="Capability Identifier" anchor="capa">

<t>
The capability string associated with the extension defined in this
document is "regex".
</t>

</section> <!-- capa -->

<section title="Regex Match Type" anchor="matchtype">

<t>
When the regex extension is available,
commands that support matching may take the optional tagged argument 
":regex" to specify that a regular expression match should be
performed.  The ":regex" match type is subject to the same rules and
restrictions as the standard match types defined in
<xref target="RFC5228"/>.
</t>

<figure>
<preamble>The "MATCH-TYPE" syntax element defined
in <xref target="RFC5228"/> is augmented here as follows:
</preamble>

<artwork>
MATCH-TYPE  =/  ":regex"
</artwork>
</figure>

</section>

<section title="Interaction with Sieve comparators">

<t>
In order to provide for matches between character sets and case
insensitivity, Sieve uses the comparators defined in the Internet
Application Protocol Collation Registry <xref target="RFC5228"/>.
The comparator used by a given test is specified by the :comparator
argument.
</t>

<t>
The interaction between collators and the match types defined in the
Sieve base specification is straightforward. Howeer, the nature of regular
expressions does not lend itself to this usage for the :regex match type.
</t>

<t>
A component of the definition of many collators is a normalization
operation. For example, the "i;octet" comparator employs an identity
normalization; whereas the "i;ascii-casema" normalizes all lower case ASCII
characters to upper case.
</t>

<t>
The :regex match type only uses the normalization component of the associated
comparator. This normalization operation is applied to the key-list argument to
the test; the result of that normalization becomes the target of the regular
expression comparison. The comparator has no effect on the regular expression
pattern or the underlying comparison operation.
</t>

<t>
It is an error to specify a comparator that has no associated normalization
operation in conjunction with a :regex match type.
</t>

</section>

<section title="Regular expression comparisions">

<t>
Implementations MUST support extended regular expressions (EREs) as
defined by <xref target="IEEE.1003-2.1992"/>.  Any regular expression
not defined by <xref target="IEEE.1003-2.1992"/>, as well as <xref
target="IEEE.1003-2.1992"/> basic regular expressions, word boundaries
and backreferences are not supported by this
extension. Implementations SHOULD reject regular expressions that are
unsupported by this specification as a syntax error.
</t>

<t>
The following tables provide a brief summary of the regular expressions
that MUST be supported.  This table is presented here only
as a guideline.  <xref target="IEEE.1003-2.1992"/> should be used as
the definitive reference.
</t>

<texttable title="Items to match a single character" anchor="regex_single">
<ttcol align="center">Expression</ttcol>
<ttcol align="left">Pattern</ttcol>
<c>.</c>
<c>Match any single character except newline.</c>
<c>[ ]</c>
<c>Bracket expression.  Match any one of the enclosed characters.  A
hypen (-) indicates a range of consecutive characters.</c>
<c>[^ ]</c>
<c>Negated bracket expression.  Match any one character NOT in the
enclosed list.  A hypen (-) indicates a range of consecutive
characters.</c>
<c>\\</c>
<c>Escape the following special character (match the literal
character).  Undefined for other characters.  NOTE: Unlike <xref
target="IEEE.1003-2.1992"/>, 
a double-backslash is required as per section 2.4.2 of <xref
target="RFC5228"/>.</c>
</texttable>

<texttable title="Items to be used within a bracket expression
(localization)" anchor="regex_bracket">
<ttcol align="center">Expression</ttcol>
<ttcol align="left">Pattern</ttcol>
<c>[: :]</c>
<c>Character class (alnum, alpha, blank, cntrl, digit, graph, lower,
print, punct, space, upper, xdigit).</c>
<c>[= =]</c>
<c>Character equivalents.</c>
<c>[. .]</c>
<c>Collating sequence.</c>
</texttable>

<texttable title="Quantifiers - Items to count the preceding regular
expression" anchor="regex_quantifer">
<ttcol align="center">Expression</ttcol>
<ttcol align="left">Pattern</ttcol>
<c>?</c>
<c>Match zero or one instances.</c>
<c>*</c>
<c>Match zero or more instances.</c>
<c>+</c>
<c>Match one or more instances.</c>
<c>{n,m}</c>
<c>Match any number of instances between n and m (inclusive).  {n}
matches exactly n instances.  {n,} matches n or more instances.</c>
</texttable>

<texttable title="Anchoring - Items to match positions" anchor="regex_anchor">
<ttcol align="center">Expression</ttcol>
<ttcol align="left">Pattern</ttcol>
<c>^</c>
<c>Match the beginning of the line or string.</c>
<c>$</c>
<c>Match the end of the line or string.</c>
</texttable>

<texttable title="Other constructs" anchor="regex_other">
<ttcol align="center">Expression</ttcol>
<ttcol align="left">Pattern</ttcol>
<c>|</c>
<c>Alternation.  Match either of the separated regular
expressions.</c>
<c>( )</c>
<c>Group the enclosed regular expression(s).</c>
</texttable>

</section> <!-- matchtype -->

<section title="Interaction with Sieve Variables" anchor="interaction">

<t>This extension is compatible with, and may be used in conjunction
with the Sieve Variables extension <xref target="RFC5229"/>.</t>

<section title="Match variables" anchor="matchvar">

<t>
A sieve interpreter which supports both "regex" and "variables",
MUST set "match variables" (as defined by <xref target="RFC5229"/>
section 3.2) whenever the ":regex" match type is used.  The list of match
variables will contain the strings corresponding to the group operators in
the regular expression.  The groups are ordered by the position of the
opening parenthesis, from left to right.  Note that in regular
expressions, expansions match as much as possible (greedy
matching).
</t>

<figure>
<preamble>Example:
</preamble>

<artwork>
require ["fileinto", "regex", "variables"];

if header :regex "List-ID" "<(.*)@" {
    fileinto "lists.${1}"; stop;
}

# Imagine the header
# Subject: [acme-users] [fwd] version 1.0 is out
if header :regex "Subject" "^[(.*)] (.*)$" {
    # ${1} will hold "acme-users] [fwd"
    stop;
}
</artwork>
</figure>

</section> <!-- matchvar -->

<section title="Set modifier :quoteregex" anchor="setmod">

<t>A sieve interpreter which supports both "regex" and "variables",
 MUST support the optional tagged argument ":quoteregex" for use with
 the "set" action.  The ":quoteregex" modifier is subject to the same
 rules and restrictions as the standard modifiers defined in <xref
 target="RFC5229"/> section 4.</t>

<figure>
<preamble>For convenience, the "MODIFIER" syntax element defined
in <xref target="RFC5229"/> is augmented here as follows:
</preamble>

<artwork>
MODIFIER  =/  ":quoteregex"
</artwork>
</figure>

<t>This modifier adds the necessary quoting to ensure that the
expanded text will only match a literal occurrence if used as a
parameter to :regex.  Every character with special meaning (".", "*",
"?", etc.) is prefixed with "\" in the expansion.  This modifier has a
precedence value of 20 when used with other modifiers.</t>

</section> <!-- setmod -->

</section> <!-- interaction -->

<section title="Examples" anchor="examples">

<figure>
<preamble>Example:
</preamble>

<artwork>
require "regex";

# Try to catch unsolicited email.
if anyof (
  # if a message is not to me (with optional +detail),
  not address :regex ["to", "cc", "bcc"]
    "me(\\\\+.*)?@company\\\\.com",

  # or the subject is all uppercase (no lowercase)
  header :regex :comparator "i;octet" "subject"
    "^[^[:lower:]]+$" ) {

  discard;    # junk it
}
</artwork>
</figure>

</section>

<section title="IANA Considerations" anchor="iana">

<t>The following template specifies the IANA registration of the
   "regex" Sieve extension specified in this document:
</t>

<figure>
<artwork>
To: iana@iana.org
Subject: Registration of new Sieve extension


Capability name: regex
Capability keyword: regex
Capability arguments: N/A
Standards Track/IESG-approved experimental RFC number: this RFC
Person and email address to contact for further information:
    Kenneth Murchison
    E-Mail: murch@andrew.cmu.edu

This information should be added to the list of Sieve extensions
given on http://www.iana.org/assignments/sieve-extensions.
</artwork>
</figure>

</section> <!-- iana -->

<section title="Security Considerations">

<t>
General Sieve security considerations are discussed in <xref target="RFC5228"/>.
All of the issues described there also apply to regular expression matching.
</t>

<t>
It is easy to construct problematic regular expressions that are computationally
infeasible to evaluate. Execution of a Sieve that employs a potentially problematic
regular expression, such as "(.*)*", may cause problems ranging from
degradation of performance to and outright denial of service. Moreover,
determining the computationl complexity associated
with evaluating a given regular expression is in general an intractable problem.
</t>

<t>
For this reason, all implementations MUST take appropriate steps to limit the
impact of runaway regular expression evaluation. Implementations MAY restrict
the regular expressions users are allowed to specify. Implementations that do
not impose such restrictions SHOULD provide a means to abort evaluation of
tests using the :regex match type if the operation is taking too long.
</t>

</section>

</middle>

<back>

<references title="Normative References">
<?rfc include="reference.IEEE.1003-2.1992" ?>
<?rfc include="reference.RFC.2119" ?>
<?rfc include="reference.RFC.5228" ?>
<?rfc include="reference.RFC.5229" ?>
</references>

<section title="Acknowledgments">

<t>Most of the text documenting the interaction with Sieve variables
was taken from an early draft of Kjetil Homme's Sieve variables
specification.</t>

<t>Thanks to Tim Showalter, Alexey Melnikov, Tony Hansen, Phil Pennock,
and Jutta Degener for their help with this document.</t>
</section>

</back>
</rfc>

PAFTECH AB 2003-20262026-04-24 17:03:28