One document matched: draft-trammell-ipfix-file-01.txt
Differences from draft-trammell-ipfix-file-00.txt
IPFIX Working Group B. Trammell
Internet-Draft CERT/NetSA
Expires: December 25, 2006 E. Boschi
Hitachi Europe
L. Mark
T. Zseby
Fraunhofer FOKUS
June 23, 2006
An IPFIX-Based File Format
draft-trammell-ipfix-file-01.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 25, 2006.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
This document describes a file format for the storage of flow data
based upon the IPFIX message format. It proposes a set of
requirements for flat-file, binary flow data file formats, evaluates
flow storage systems presently in use for their conformance to these
Trammell, et al. Expires December 25, 2006 [Page 1]
Internet-Draft IPFIX Files June 2006
requirements, then applies the IPFIX message format to these
requirements to build a new file format. This IPFIX file format is
designed especially to be useful to the implementors of IPFIX
Collecting Processes.
1. Introduction
The IPFIX message format makes an ideal basis for a standard flow
file format for archival storage purposes and document-based workflow
support. As it was designed for the efficient and flexible
representation of a variety flow and flow-like data, it is more
extensible than ad-hoc file formats derived from simple data model
serialization, and more efficient than record-structured textual
formats such as XML.
This document explores the motivation for building a flow file format
atop the IPFIX message format. It then proposes a set of
requirements for this file format, and describes either how the IPFIX
message format meets each requirement, how a file format based upon
it could meet the requirement, or how the message format must be
extended to meet the requirement. The document also examines
existing flow storage file formats for their conformance to these
requirements.
The purpose of this revision of the document is to foster discussion
on the motivation and requirements sections in advance of proposing
the design of a file format; consequently, the sections on the file
format itself and examples of IPFIX files are currently placeholders
without content. It is our aim to use this document and discussions
concerning it in the IPFIX working group as a basis for future work
on this effort.
2. Terminology
Terms used in this document that are defined in the Terminology
section of the IPFIX Protocol [1] document are to be interpreted as
defined there.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [3].
3. Motivation
We have identified two major use cases for file-based storage of IP
Trammell, et al. Expires December 25, 2006 [Page 2]
Internet-Draft IPFIX Files June 2006
flow data. The first is long-term, persistent storage of flow data
for archival purposes. Filesystems often make sense as a persistent
storage backend due to their ubiquity, simplicity, and flexibility.
There are a wide variety of operations available on files (e.g.,
external compression and encryption, atomic backup) that are made
more difficult with a more integrated persistent storage system such
as a relational database management system (RDBMS). As flow data is
often not very semantically complicated, and is managed in very high
volume, the simplicity of a file-based persistent storage backend can
outweigh the advantages of these other storage systems.
The second use case is in document-based workflows. Users of many
information processing systems are accustomed to dealing with
documents which encapsulate all the information about a work item or
collection of work items; even in situations in which document-based
workflows may have significant disadvantages (e.g., revision control
in a multi-editor environment), many user communities still prefer
documents as the "atom" of work due to their simplicity. As an
example relevant to flow data, the most common unit of work in the
network forensics and research communities is the packet trace file,
and utilities such as Ethereal explicitly treat these packet traces
as documents. It seems likely that as flow data analysis tools are
developed, many will choose to support a document-based workflow; a
standard format for this document would be of great use to the
analysis community. Document-based workflows are especially well
supported by file-based formats.
The simplest way to create a new file format is simply to serialize
some internal data model to disk, with either textual or binary
representation of data elements, and some framing strategy for
delimiting fields and records. "Ad-hoc" file formats such as this
have several important disadvantages. One, they impose the semantics
of the data model from which they are derived on the file format; as
such, they are difficult to extend, describe, and standardize.
The emergence over the past decade of XML as a new "universal"
framing format for flat as well as heirarchical data addresses these
concerns; however, XML is not necessarily ideal for a storage format
for flow data. First, flow data, being inherently simple and record-
oriented, does not benefit from the more advanced semantics available
with XML. There is not much to be gained by describing each record
individually when the records all have the same format, or one of a
small set of formats. Second, XML processing introduces potentially
significant overhead. While an XML stream should in theory be
approximately as compressible as any other stream representation, the
additional compression/decompression and generation/parsing of XML
data is not worth the benefit in this case.
Trammell, et al. Expires December 25, 2006 [Page 3]
Internet-Draft IPFIX Files June 2006
This leads us to propose the IPFIX message format as the basis for a
new flow data file format. The IPFIX working group, in defining the
IPFIX protocol, has already defined an information model and data
formatting rules for representation of flow data. Especially in the
document-based workflow use case, a file may be viewed as simply
another IPFIX message transport between processes. This format is
especially well suited to representing flow data, as it was designed
specifically for that use case; it is easily extensible unlike ad-hoc
serialization, and compact unlike XML. In addition, IPFIX is an
emerging standard for the export and collection of flow data; using a
common format for storage and analysis at the collection side allows
implementors to use substantially the same information model and data
formatting implementation for transport as well as storage.
4. Requirements
In this section, we outline a proposed set of requirements for any
persistent storage format for flow data. First and foremost, a flow
data file format should support both of the broad use cases addressed
in the Motivation. In addition, the requirements enumerated in the
sections below apply to both use cases. For each, we first identify
the requirement, then explain how the IPFIX message format addresses
it, or briefly outline the changes that must be made in order for an
IPFIX-based file format to meet the requirement.
4.1. Extensibility
Due to the wide variety of flow attributes collected by different
network flow attribute measurement systems, the ideal flow storage
format will not impose a single data model or a specific record type
on the flows it stores. The file format must be extensible; that is,
it must be flexible enough to support multiple record types, and must
be able to support new field types for data within the records in a
graceful way.
IPFIX provides extensibility through the use of Templates to describe
each Data Record, through the use of an IANA Registry to define its
Information Elements, and through the use of enterprise-specific
Information Elements.
4.2. Self Description
Archived data may be read at a time in the future where any external
reference to the meaning of the data may be lost. The ideal flow
storage format should be self-describing; that is, a process reading
flow data from storage should be able to properly interpret the
stored flows without reference to anything other than standard
Trammell, et al. Expires December 25, 2006 [Page 4]
Internet-Draft IPFIX Files June 2006
sources (e.g., the standards document describing the file format) and
the stored flow data itself
The IPFIX message format is partially self-describing; that is, IPFIX
Templates containing only IANA-assigned Information Elements can be
completely interpreted according to the IPFIX Information Model
without additional external data. However, to be fully self-
describing, the IPFIX message format would require extension to add
type and semantic information to the definitions of enterprise-
specific Information Elements.
4.3. Data Compression
Regardless of the representation format, flow data describing traffic
on real networks tends to be highly compressible. Compression tends
to improve the scalability of flow collection systems, by reducing
the disk storage and I/O bandwidth requirement for a given workload.
The ideal flow storage format should support applications which wish
to leverage this fact by supporting compression of stored data.
The IPFIX message format has no support for data compression.
However, any flat file is readily compressible using a wide variety
of external data compression tools, formats, and algorithms. If
finer granularity than file-level compression is required, the IPFIX
message format would require an extension to add some notation that a
record set or message is compressed.
4.4. Indexing and Searching
Binary, record stream oriented file formats natively support only one
form of searching, sequential scan in file order. By choosing the
order of records in a file carefully (e.g., by time), a file can be
"indexed" by a single key. Adding additional indexes to the file can
speed searches considerably. The ideal flow storage format will
support a method for noting that the records in a file are sorted by
a certain key or set of keys, and for providing index information for
keys on which the file is not sorted.
There is presently no support for indexing or sort order notation in
the IPFIX message format. If internal indexing is required, it would
need to be added to an IPFIX-based file format by extension.
4.5. Data Integrity and Error Correction
When storing flow data for archival purposes, it is important to
ensure that hardware or software faults do not introduce errors into
the data over time. The ideal flow storage format will support the
detection and correction of encoding-level errors in the data.
Trammell, et al. Expires December 25, 2006 [Page 5]
Internet-Draft IPFIX Files June 2006
Note that this requirement is almost certainly best handled at a
layer below that addressed by this document. Error correction is a
topic well addressed by filesystem developers and the storage
industry in general, and by specifying a flow storage format based
upon files, we can leverage these features to meet this requirement.
The IPFIX message format does not support data integrity assurance or
error correction; it is assumed that this requirement will be met
externally.
4.6. Creator Authentication and Confidentiality
Archival storage of flow data also requires assurance that no
unauthorized entity can read or modify the stored data. Asymmetric-
key cryptography can be applied to this problem, by signing flow data
with the private key of the creator, and encrypting it with the
public keys of those authorized to read it. The ideal flow storage
format will support the encryption and signing of flow data.
As with error correction, this problem has been addressed well at a
layer below that addressed by this document. Instead of specifying a
particular choice of encryption technology, we can leverage the fact
that existing cryptographic technologies work quite well on data
stored in files to meet this requirement.
Beyond support for the use of TLS for transport over TCP or SCTP,
both of which provide transient authentication and confidentiality,
the IPFIX message format does not support this requirement directly.
It is assumed that this requirement will be met externally.
4.7. Anonymization and Obfuscation
To ensure the privacy of individuals and organizations at the
endpoints of communications represented by flow records, it is often
necessary to obfuscate or anonymize stored and exported flow data.
The ideal flow storage format will provide for a notation that a
given information element on a given record type represents
anonymized, rather than real, data.
The IPFIX message format has presently has no support for
anonymization notation. It should be noted that anonymization is one
of the requirements given for IPFIX in RFC 3917 [2]. The decision to
qualify this requirement with 'MAY' and not 'MUST' in the
requirements document, and its subsequent lack of specification in
the current version of the IPFIX protocol, is due to the fact that
anonymization algorithms are still a research issue, and that there
currently exist no standardized methods for anonymization.
Trammell, et al. Expires December 25, 2006 [Page 6]
Internet-Draft IPFIX Files June 2006
It is reasonable to assume, given the stated requirements for the
IPFIX protocol itself, that future extensions to the protocol will
provide for the anonymization of flow records.
5. Survey of Existing Flow and Trace File Formats
5.1. Argus 2
QoSient's Argus (as of version 2.0.6) uses a file format based upon a
stream of type-and-length prefixed records. There are two general
types of records in this stream, management records and flow records.
Management records export flow collection statistics, much like the
recommended scoped data records in the IPFIX protocol. Flow records
contain information about a single flow each, and are further typed
based upon the protocol of the flow (e.g., IP, ICMP, ARP). The Argus
file format natively spports bidirectional flow export, as each flow
record contains both forward and reverse counters.
The Argus tools support a transport protocol that simply encapsulates
a record stream over a TCP connection. Transport is collector-
initiated; that is, a collector establishes a connection to an
exporter in order to read a record stream.
Argus files are not self-describing; that is, only the Argus tools
themselves encapsulate the definition of each of the record types.
The Argus file format is not extensible without changing the Argus
implementation. Argus provides no indexing facility for its file
format, though records are roughly sorted by record generation time.
Compression, error correction, authentication, and confidentiality
are handled externally to the format, and are available as with all
files. There is no special support for data obfuscation in the
format.
5.2. SiLK
The CERT/NetSA SiLK tools (http://silktools.sourceforge.net) use a
set of fixed-length binary record formats. Each file is prefixed
with a header which denotes which record format the file is stored
in. These record formats are differentiated by the presence or
absence of certain fields; in this way, each format identifier is
essentially a short-hand identifier for a template describing the
record. This also implies that only one type of record may be stored
in any given file.
As with Argus, SiLK files are not self-describing and are not
extensible. SiLK provides no indexing facility, though files are
generally stored in flow end time order; and when used for archival
Trammell, et al. Expires December 25, 2006 [Page 7]
Internet-Draft IPFIX Files June 2006
storage, information about sensors and flow times appearing in each
file is stored in the file path name. Compression is handled
internally to the file format, and allows the storage of compressed
data in a file with uncompressed headers, and a guarantee of
compression block boundary alignment with record boundaries. Error
correction, authentication, and confidentiality can be handled
externally. There is no special support for data obfuscation in the
SiLK file format.
6. IPFIX File Format Description
The IPFIX file format description is not yet available, as the
purpose of this document is to elicit feedback and foster discussion
on the motivation and requirements for an IPFIX file format. A
future revision of this document will include a complete description
of the IPFIX file format in terms of the IPFIX message format.
7. Examples
Examples are not yet available as the file format has not yet been
fully described. A future revision of this document will contain
examples.
8. Security Considerations
The IPFIX-based file format itself does not directly introduce
security issues. Rather it is used to store information which may
for privacy or business issues be considered sensitive. The file
format must therefore provide appropriate procedures to guarantee the
integrity and confidentiality of the stored information.
The underlying protocol used to exchange the information that will be
stored using the format proposed in this document must as well apply
appropriate procedures to guarantee the integrity and confidentiality
of the exported information. Such issues are addressed in separate
documents, specifically in the IPFIX Protocol [1].
9. IANA Considerations
This document has no actions for IANA.
10. Open Issues and Notes
Trammell, et al. Expires December 25, 2006 [Page 8]
Internet-Draft IPFIX Files June 2006
This draft is presently incomplete. The intent of this revision is
to provide a starting point to discuss requirements for an IPFIX-
based file format, and the applicability of the work proposed herein
to the mission of the IPFIX Working Group. [bht]
The survey of existing file formats is incomplete, and includes only
file formats with which one of the authors has personal experience.
[bht]
There should be a mention of the zero-length message and set hacks
(from -00) somewhere in this draft, to support the IPFIX header on
legacy fixed-length binary file use case. [bht]
11. References
11.1. Normative References
[1] Claise, B., "IPFIX Protocol Specification",
draft-ietf-ipfix-protocol-22 (work in progress), June 2006.
11.2. Informative References
[2] Quittek, J., Zseby, T., Claise, B., and S. Zander, "Requirements
for IP Flow Information Export (IPFIX)", RFC 3917, October 2004.
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
Trammell, et al. Expires December 25, 2006 [Page 9]
Internet-Draft IPFIX Files June 2006
Authors' Addresses
Brian H. Trammell
CERT Network Situational Awareness
Software Engineering Institute
4500 Fifth Avenue
Pittsburgh, PA 15213
United States
Phone: +1 412 268 9748
Email: bht@cert.org
Elisa Boschi
Hitachi Europe SAS
Immueble Le Theleme
1503 Route les Dolines
Valbonne 06560
France
Phone: +33 4 89874180
Email: elisa.boschi@hitachi-eu.com
Lutz Mark
Fraunhofer Institute for Open Communication Systems
Kaiserin-Augusta-Allee 31
Berlin 10589
Germany
Phone: +49 30 3463 7306
Email: mark@fokus.fraunhofer.de
Tanja Zseby
Fraunhofer Institute for Open Communication Systems
Kaiserin-Augusta-Allee 31
Berlin 10589
Germany
Phone: +49 30 3463 7153
Email: zseby@fokus.fraunhofer.de
Trammell, et al. Expires December 25, 2006 [Page 10]
Internet-Draft IPFIX Files June 2006
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Trammell, et al. Expires December 25, 2006 [Page 11]
| PAFTECH AB 2003-2026 | 2026-04-24 04:30:39 |