One document matched: draft-roach-sipping-clf-syntax-01.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd"[
<!ENTITY draft-gurbani-sipping-clf PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-gurbani-sipping-clf-01.xml'>
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes"?>
<?rfc compact="yes" ?>
<?rfc sortrefs="no" ?>
<?rfc symrefs="yes" ?>
<rfc ipr="trust200902">
<front>
<title>Binary Syntax for SIP Common Log Format</title>
<author fullname="Adam Roach" initials="A. B." surname="Roach">
<organization>Tekelec</organization>
<address>
<postal>
<street>17210 Campbell Rd.</street>
<street>Suite 250</street>
<city>Dallas</city>
<region>TX</region>
<code>75252</code>
<country>US</country>
</postal>
<email>adam@nostrum.com</email>
</address>
</author>
<date month="May" year="2009" />
<area>Real Time Applications and Infrastructure</area>
<abstract>
<t>
This document proposes a binary syntax for the SIP
common log format (CLF). It does not cover semantic issues,
and is meant to be evaluated in the context of the
other efforts discussing SIP CLF.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
The Common Log File (CLF) format for the Session Initiation
Protocol (SIP) <xref target="I-D.gurbani-sipping-clf"/>
proposes a syntax for logging SIP messages received and
sent by SIP clients, servers, and proxies. The syntax
proposed by that document has been inspired by the common
HTTP log format. However, experience with that format has
shown that dealing with large quantities of log data can
be very processor intensive, as doing so necessary requires
reading and parsing every byte in the log file(s) of interest.
</t>
<t>
This document counter-proposes a format that is no more
difficult to generate by logging entities, while being
radically faster to process. In particular, the format
is optimized for both rapidly scanning through log
records, as well as quickly locating commonly-accessed
data fields. Both operations can be performed in constant time
(as compared with O(n) time associated with the current format,
where n is the length of the log record).
</t>
<t>
Further, the format proposed by this document retains the
ability to be read by humans and processed using traditional
Unix text processing tools, such as sed, awk, perl, cut, and grep.
</t>
</section>
<section title="Format">
<t>
Each data record is encoded according to the following
format. Note that indications of "hexadecimal encoded"
indicate that the value is to be written out in human-readable
base-16 numbers using the ASCII characters 0x30 through 0x39
and 0x41 through 0x46 ('0' through '9' and 'A' through 'F').
Similarly, indications of "decimal encoded" indicate that the
value is to be written out in
human readable base-10 number using the ASCII characters
0x30 through 0x39 ('0' through '9'). In both encodings, numbers
always take up the number of bytes indicated, and are padded on
the left with ASCII '0' characters to fill the entire space.
</t>
<t>
<figure>
<artwork><![CDATA[
0 7 8 15 16 23 24 31
+--------+--------+--------+--------+
|Version | Flags Field | 0 - 3
+--------+--------+--------+--------+
| 0x2C | Record Length | 4 - 7
+--------+--------+--------+--------+
| Record Length (cont) | 0x2C | 8 - 11
+--------+--------+--------+--------+
| Server Txn Pointer (Hex) | 12 - 15
+--------+--------+--------+--------+
| Server Txn Length (Hex) | 16 - 19
+--------+--------+--------+--------+
| Client Txn Pointer (Hex) | 20 - 23
+--------+--------+--------+--------+
| Client Txn Length (Hex) | 24 - 27
+--------+--------+--------+--------+
| Method Pointer (Hex) | 28 - 31
+--------+--------+--------+--------+
| Method Length (Hex) | 32 - 35
+--------+--------+--------+--------+
| To Value Pointer (Hex) | 36 - 39
+--------+--------+--------+--------+
| To Value Length (Hex) | 40 - 43
+--------+--------+--------+--------+
| To Tag Pointer (Hex) | 44 - 47
+--------+--------+--------+--------+
| To Tag Length (Hex) | 48 - 51
+--------+--------+--------+--------+
| From Value Pointer (Hex) | 52 - 55
+--------+--------+--------+--------+
| From Value Length (Hex) | 56 - 59
+--------+--------+--------+--------+
| From Tag Pointer (Hex) | 60 - 63
+--------+--------+--------+--------+
| From Tag Length (Hex) | 64 - 67
+--------+--------+--------+--------+
| Call-Id Pointer (Hex) | 68 - 71
+--------+--------+--------+--------+
| Call-Id Length (Hex) | 72 - 75
+--------+--------+--------+--------+
| TLV Start Pointer (Hex) | 76 - 79
+--------+--------+--------+--------+
| 0x0A | | 80 - 83
+--------+ +
| Date/Time | 84 - 87
+ +--------+
| | 0x2E | 88 - 91
+--------+--------+--------+--------+
| Fractional Seconds | 92 - 95
+ +--------+--------+
| | 0x09 | | 96 - 99
+--------+--------+--------+ +
| | 100 - 103
+ CSeq +
| | 104 - 107
+ +--------+--------+--------+
| | 0x09 | Response | 108 - 111
+--------+--------+--------+--------+
|Response| 0x09 | | 112 - 115
+--------+--------+ +
| |
| |
| Mandatory Fields |
| |
| |
+--------+--------+--------+--------+ \
| 0x09 | Tag (Hex) | \
+--------+--------+--------+--------+ \
|Tag,cont| 0x2C | Length (Hex) | \ Repeated as
+--------+--------+--------+--------+ > many times
| Length (cont) | 0x2C | | / as necessary
+--------+--------+--------+ + /
| Value | /
+--------+--------+--------+--------+ /
| 0x0A |
+--------+
]]></artwork>
</figure>
</t>
<t>
First, an 80-byte header indicates meta-data about
the record. Note that the field lengths encoded in the
header do not include the ASCII tab characters used to
separate fields from each other.
</t>
<t>
<list style="hanging">
<t hangText="Version (1 byte):"> 0x41 for this document
<vspace blankLines="1"/></t>
<t hangText="Flags Field (3 bytes):">
<list style="hanging">
<t hangText="byte 1 - "> Request/Response flag
(R = request, r = response)</t>
<t hangText="byte 2 - "> Retransmission flag
(o = original transmission; d = duplicate transmission;
s = server is stateless [i.e., retransmissions are
not detected])</t>
<t hangText="byte 3 - "> Sent/Received flag
(r = message received, s = message sent)</t>
</list>
<vspace blankLines="1"/></t>
<t hangText="Record Length (6 bytes):"> Hexadecimal-Encoded Total
length of this log record, including "Flags" and "Record Length" fields,
and terminating line-feed<vspace blankLines="1"/></t>
</list>
Bytes 12 through 72 contain hexadecimal-encoded
pointer/length pairs that point
to the values of variable-length mandatory fields. The
"Pointer" fields indicate absolute byte values within the
record, and must be >= 103. They point to the start
of the corresponding value within the "Mandatory Fields"
area. The "Length" fields indicate the length of the
corresponding value.
The final
pointer, "TLV Start Pointer," points to the ASCII Tab
(0x09) character for the first entry in the Tag/Length/Value
area; if no such entries are present, this value is set to
zero.
</t>
<t>
Note that the "Length" fields do
not include the tab delimiters between fields.
Further note that there are no delimiters between
these pointer/length values -- they are packed together
as a single, 68-character hexadecimal encoded string.
</t>
<t>
Following the pointer/length pairs, several fixed-length
fields are encoded. As before, all fields are completely
filled, pre-pending values with '0' characters as necessary.
<vspace blankLines="1"/>
<list style="hanging">
<t hangText="Date/Time (10 bytes):"> Seconds since midnight, January 1st,
1970, GMT; decimal encoded<vspace blankLines="1"/></t>
<t hangText="Fraction Seconds (6 bytes):"> Microseconds since
the time in Date/Time field; decimal encoded<vspace blankLines="1"/></t>
<t hangText="CSeq Number (10 bytes):"> CSeq number from the SIP message;
decimal encoded<vspace blankLines="1"/></t>
<t hangText="Response Code (3 bytes):"> Set to the value of the response
code for responses. Set to 0 for requests. Decimal encoded.
<vspace blankLines="1"/></t>
<t hangText="Mandatory Field Data:"> Contains actual values for the
mandatory fields. This data must appear in the order listed,
and each field must be present. Fields are separated by a
single ASCII Tab character (0x09). Any tab characters present
in the data to be written will be replaced by an ASCII space
character (0x20) prior to being logged.
<vspace blankLines="1"/></t>
<t><list style="hanging">
<t hangText="Server Txn:"> The transaction identifier
associated with the server transaction. Implementations MAY reuse
the server transaction identifier (the topmost branch-id of the
incoming request, with or without the magic cookie), or they MAY
generate a unique identification string for a server transaction
(this identifier needs to be locally unique to the server only.)
This identifier is used to correlate ACKs and CANCELs to an INVITE
transaction; it is also used to aid in forking.
<vspace blankLines="1"/></t>
<t hangText="Client Txn:"> This field is used to
associate client transactions with a server transaction for
forking proxies or B2BUAs.<vspace blankLines="1"/></t>
<t hangText="Method:"> In requests, the method from the start line.
In responses, the method found in the CSeq header field.
<vspace blankLines="1"/></t>
<t hangText="To Value:"> Value of the To header field, possibly
with the tag parameter removed. (Whether to remove the tag parameter is
left up to the logging entity).<vspace blankLines="1"/></t>
<t hangText="To Tag:"> Value of the To header field tag parameter. If
no To header field tag parameter is present, the pointer field
is ignored; the length field is set to 0; and the field in the
mandatory section is encoded as a single ASCII dash (0x2D).
<vspace blankLines="1"/></t>
<t hangText="From Value:"> Value of the From header field, possibly
with the tag parameter removed. (Whether to remove the tag parameter is
left up to the logging entity)<vspace blankLines="1"/></t>
<t hangText="From Tag:"> Value of the From header field tag parameter.
<vspace blankLines="1"/></t>
<t hangText="Call-Id:"> The value of the Call-ID header field
<vspace blankLines="1"/></t>
</list></t>
</list></t>
<t>
After the "Mandatory Fields" section,
Tag/Length/Value groups appear zero or more times. The location
within the log record is indicated by the "TLV Start Ptr" field.
They are used to log information that is not mandatory for all
messages (although specific TLVs are mandatory in request logs).
<vspace blankLines="1"/>
</t>
<t><list style="hanging">
<t hangText="Tag Field (4 bytes):"> indicates the type of value coded
by this TLV; hexadecimal encoded.
Currently defined tags are: <vspace blankLines="1"/>
<list style="hanging">
<t hangText="0x0000 -"> Contact value (can be repeated)<vspace/>
Contains entire value of Contact header field<vspace blankLines="1"/></t>
<t hangText="0x0001 -"> Request URI (mandatory in request)<vspace/>
Contains Request URI in start line<vspace blankLines="1"/></t>
<t hangText="0x0002 -"> Remote Host (mandatory in request)<vspace/>
The DNS name of IP address from which the
message was received (if "sent/received flag" is 0)
of the IP address to which the message is being
send (if "sent/received flag" is 1)<vspace blankLines="1"/></t>
<t hangText="0x0003 -"> Authenticated User<vspace/>
Contains the user name by which the user has
been authenticated<vspace blankLines="1"/></t>
<t hangText="0x0004 -"> Complete SIP Message (optional, should be omitted by default)<vspace/>
Contains complete SIP message. Can be repeated
multiple times to accommodate SIP messages that
exceed 65535 bytes in length.<vspace blankLines="1"/></t>
</list>
<vspace blankLines="1"/></t>
<t hangText="Length Field (2 bytes):"> indicates the length of the value
coded in this TLV, hexadecimal encoded. This length does NOT include the TLV
header.<vspace blankLines="1"/></t>
<t hangText="Value Field (0 to 65535 bytes):"> contains the actual
value of this TLV. As with the mandatory fields, ASCII Tab characters
(0x09) are replaced with ASCII space characters (0x20).
<vspace blankLines="1"/></t>
</list>
<vspace blankLines="1"/>
</t>
</section>
<section title="Example Record">
<t>
The following demonstrates approximately how a single log record
appears in a logging file.
Due to internet-draft conventions, this log entry
has been split into ten lines, instead of the two lines
that actually appear in a log file; and the tab characters
have been padded out using spaces to simulate their appearance
in a text terminal.
<figure>
<artwork><![CDATA[
ARor,000181,
0072000D00800000008100060088002000A9000600B0002500D6000A00E100270108
1241708241.308241 0000000187 000 7yuz67jhyi9-9 -
INVITE Bob <sip:bob@biloxi.example.com> 314159
Alice <sip:alice@atlanta.example.com> 9fxced76sl
3848276298220188511@atlanta.example.com
0000,0034,<sip:alice@client.atlanta.example.com;transport=tcp>
0001,001A,sip:bob@biloxi.example.com
0002,000C,192.168.9.12
]]></artwork>
</figure>
A uuencoded version of this log entry (without the
changes required to format it for an internet-draft)
follows.
<figure>
<artwork><![CDATA[
begin 644 sip-clf.txt
M05)O<BPP,#`Q.#$L,#`W,C`P,$0P,#@P,#`P,#`P.#$P,#`V,#`X.#`P,C`P
M,$$Y,#`P-C`P0C`P,#(U,#!$-C`P,$$P,$4Q,#`R-S`Q,#@*,3(T,3<P.#(T
M,2XS,#@R-#$),#`P,#`P,#$X-PDP,#`)-WEU>C8W:FAY:3DM.0DM"4E.5DE4
M10E";V(@/'-I<#IB;V)`8FEL;WAI+F5X86UP;&4N8V]M/@DS,30Q-3D)06QI
M8V4@/'-I<#IA;&EC94!A=&QA;G1A+F5X86UP;&4N8V]M/@DY9GAC960W-G-L
M"3,X-#@R-S8R.3@R,C`Q.#@U,3%`871L86YT82YE>&%M<&QE+F-O;0DP,#`P
M+#`P,S0L/'-I<#IA;&EC94!C;&EE;G0N871L86YT82YE>&%M<&QE+F-O;3MT
M<F%N<W!O<G0]=&-P/@DP,#`Q+#`P,4$L<VEP.F)O8D!B:6QO>&DN97AA;7!L
=92YC;VT),#`P,BPP,#!#+#$Y,BXQ-C@N.2XQ,@H`
`
end
]]></artwork>
</figure>
</t>
</section>
<section title="Text Tool Considerations">
<t>
This format has been designed to allow text tools to easily
process logs without needing to understand the indexing
format. Index lines may be rapidly discarded by checking
the first character of the line: index lines will always
start with an alphabetical character, while field lines
will start with a numerical character.
</t>
<t>
Within a field line, script tools can quickly split fields
at the tab characters. The first 11 fields are positional,
and the meaning of any subsequent fields can be determined
by checking the first four characters of the field.
Alternately, these non-positional fields can be located
using a regular expression. For example, the "Request URI"
in a request can be found by searching for the perl regex
/\t0001,....,([^\t]*)/.
</t>
<t>
Note also that requests can be distinguished from responses
by checking the third positional field -- for requests,
it will always be set to "000"; any other value indicates
a response.
</t>
</section>
</middle>
<back>
<references title="Normative References">
&draft-gurbani-sipping-clf;
</references>
<section title="Acknowledgements">
<t>
Cullen put me up to this.
</t>
<t>
Tom Taylor suggested the technique
of combining the length field structure from the binary
format with the human-readable ASCII format to allow both
rapid processing by advanced tools, and easy processing by
simpler, text-centric tools. Dean Willis suggested the use
of tab delimiters as a means to avoid the need to escape values
within a field. Vijay Gurbani provided significant feedback,
and wrote the original proof-of-concept program which was
adapted to produce the examples in this document.
</t>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 14:16:54 |