http://stupid.domain.name/ietf/

One document matched: draft-ietf-nfsv4-pnfs-obj-09.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "xml/rfc2629.dtd">
<?xml-stylesheet type='text/xsl' href='xml/rfc2629.xslt' ?>

<!-- XML source for the pnfs over objects internet draft document -->

<!-- To generate text with the xml2rfc tool tclsh8.3 xml2rfc.tcl 
     xml2rfc this_file.xml that_file.txt which puts the formatted 
     text into that_file.txt -->

<!-- processing instructions (for a complete list and description,
     see file http://xml.resource.org/authoring/README.html -->

<!-- try to enforce the ID-nits conventions and DTD validity -->

<?rfc strict="yes" ?>

<!-- items used when reviewing the document -->

<?rfc comments="no" ?>  <!-- controls display of <cref> elements -->
<?rfc inline="no" ?>    <!-- when no, put comments at end in comments section,
                                otherwise, put inline -->
<?rfc editing="no" ?>   <!-- when yes, insert editing marks -->

<!-- create table of contents (set it options).  
     Note the table of contents may be omitted
     for very short documents --> 

<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>

<!-- choose the options for the references. Some like
     symbolic tags in the references (and citations)
     and others prefer numbers. --> 

<?rfc symrefs="no"?>
<?rfc sortrefs="yes" ?>

<!-- these two save paper: start new paragraphs from the same page etc. -->

<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>

<!-- end of list of processing instructions -->

<rfc
    category="std"
    ipr="full3978"
    docName="draft-ietf-nfsv4-pnfs-obj-09">

<front>
    <title abbrev="pnfs objects">Object-based pNFS Operations</title>

    <author fullname="Benny Halevy" 
            initials="B." 
            surname="Halevy">
        <organization abbrev="Panasas">Panasas, Inc.</organization>
        <address>
            <postal>
                <street>1501 Reedsdale St. Suite 400</street>
                <city>Pittsburgh</city>
                <region>PA</region>
                <code>15233</code>
                <country>USA</country>
            </postal>
            <phone>+1-412-323-3500</phone>
            <email>bhalevy@panasas.com</email>
            <uri>http://www.panasas.com/</uri>
        </address>
    </author>

    <author fullname="Brent Welch" 
            initials="B." 
            surname="Welch">
        <organization abbrev="Panasas">Panasas, Inc.</organization>
        <address>
            <postal>
                <street>6520 Kaiser Drive</street>
                <city>Fremont</city>
                <region>CA</region>
                <code>95444</code>
                <country>USA</country>
            </postal>
            <phone>+1-650-608-7770</phone>
            <email>welch@panasas.com</email>
            <uri>http://www.panasas.com/</uri>
        </address>
    </author>

    <author fullname="Jim Zelenka" 
            initials="J." 
            surname="Zelenka">
        <organization abbrev="Panasas">Panasas, Inc.</organization>
        <address>
            <postal>
                <street>1501 Reedsdale St. Suite 400</street>
                <city>Pittsburgh</city>
                <region>PA</region>
                <code>15233</code>
                <country>USA</country>
            </postal>
            <phone>+1-412-323-3500</phone>
            <email>jimz@panasas.com</email>
            <uri>http://www.panasas.com/</uri>
        </address>
    </author>

    <date year="2008" month="June" day="19"/>

    <area>Transport</area>
    <workgroup>NFSv4</workgroup>

    <abstract>
      <t>
      This Internet-Draft provides a description of the object-based
      pNFS extension for NFSv4.  This is a companion to the main
      pnfs specification in the NFSv4 Minor Version 1 Internet Draft,
      which is currently draft-ietf-nfsv4-minorversion1-23.
      </t>
    </abstract>

    <note title="Requirements Language">
<t>The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be
interpreted as described in <xref target="RFC2119">RFC 2119</xref>.
</t>

    </note>

</front>

<middle>

<section title="Introduction">
 <t>
In pNFS, the file server returns typed layout structures that describe where
file data is located.  There are different layouts for different storage systems
and methods of arranging data on storage devices.  This document describes
the layouts used 
with object-based storage devices (OSD) that are accessed
according to the OSD storage protocol standard
(<xref target="osd standard">SNIA T10/1355-D</xref>).
 </t>
 <t>
An "object" is a container for data and attributes, and files are stored in
one or more objects.  The OSD protocol specifies several operations on
objects, including READ, WRITE, FLUSH, GET ATTRIBUTES, SET ATTRIBUTES, CREATE and DELETE.
However, using the object-based layout the client only uses the READ, WRITE, GET ATTRIBUTES and FLUSH
commands. The other commands are only used by the pNFS server.
 </t>
 <t>
An object-based layout for pNFS includes object identifiers, capabilities
that allow clients to READ or WRITE those objects, and various parameters
that control how file data is striped across their component objects.
The OSD protocol 
has a capability-based security scheme that allows the
pNFS server to control what operations and what objects can be used by clients.
This scheme is described in more detail in the 
<xref target="Security Considerations">Security Considerations section</xref>.
 </t>
</section> <!-- Introduction -->

<section anchor="xdr_desc" title="XDR Description of the Objects-Based Layout Protocol">
 <t>
This document contains the <xref target='XDR'>XDR</xref> description of the
NFSv4.1 objects layout protocol.
The XDR description is embedded in this document in a way that makes it simple
for the reader to extract into a ready to compile form.
The reader can feed this document into the following shell script to produce
the machine readable XDR description of the NFSv4.1 objects layout protocol:
 </t>
 <figure>
  <artwork>
#!/bin/sh
grep '^ *///' $* | sed 's?^ *///??'
  </artwork>
 </figure>
 <t>
I.e. if the above script is stored in a file called "extract.sh", and
this document is in a file called "spec.txt", then the reader can do:
 </t>
 <figure>
  <artwork>
sh extract.sh < spec.txt > pnfs_osd_prot.x
  </artwork>
 </figure>
 <t>
The effect of the script is to remove leading white space from each
line, plus a sentinel sequence of "///".
 </t>
 <t>
The embedded XDR file header follows.
Subsequent XDR descriptions, with the sentinel sequence are
embedded throughout the document.
 </t>
 <t>
Note that the XDR code contained in this document depends on types from
the NFSv4.1 nfs4_prot.x file (<xref target='NFS41_DOT_X' />).
This includes both nfs types that end with a 4,
such as offset4, length4, etc,
as well as more generic types such as uint32_t and uint64_t.
 </t>
 <figure>
  <artwork>
////*
/// * This file was machine generated for
/// * draft-ietf-nfsv4-pnfs-obj-09
/// * Last updated Thu Jun 19 07:35:44 UTC 2008
/// *
/// * Copyright (C) The IETF Trust (2007-2008)
/// * All Rights Reserved.
/// *
/// * Copyright (C) The Internet Society (1998-2006).
/// * All Rights Reserved.
/// */
///
////*
/// * pnfs_osd_prot.x
/// */
///
///%#include <nfs4_prot.x>
///
  </artwork>
 </figure>

<section title="Basic Data Type Definitions">
 <t>
The following sections define basic data types and constants
used by the Object-Based Layout protocol.
 </t>

 <section anchor="pnfs_osd_objid4" title="pnfs_osd_objid4">
  <t>
An object is identified by a number, somewhat like an inode number.
The object storage model has a two level scheme, where the objects
within an object storage device are grouped into partitions.
  </t>
  <figure>
   <artwork>
///struct pnfs_osd_objid4 {
///    deviceid4       oid_device_id;
///    uint64_t        oid_partition_id;
///    uint64_t        oid_object_id;
///};
///
   </artwork>
  </figure>
  <t>
The pnfs_osd_objid4 type is used to identify an object within a partition
on a specified object storage device.
"oid_device_id" selects the
object storage device from the set of available storage devices.
The device is identified with the deviceid4 type, which
is an index into addressing information about that device
returned by the GETDEVICELIST and GETDEVICEINFO operations.
The deviceid4 data type is defined in
<xref target='NFSv4.1'>NFSv4.1 draft</xref>.
Within an OSD, a partition is identified with a 64-bit number, "oid_partition_id".
Within a partition, an object is identified with a 64-bit number, "oid_object_id".
Creation and management of partitions
is outside the scope of this standard, and is a facility provided
by the object storage file system.
  </t>
 </section> <!-- pnfs_osd_objid4 -->

 <section anchor="pnfs_osd_version4" title="pnfs_osd_version4">
  <figure>
   <artwork>
///enum pnfs_osd_version4 {
///    PNFS_OSD_MISSING    = 0,
///    PNFS_OSD_VERSION_1  = 1,
///    PNFS_OSD_VERSION_2  = 2
///};
///
   </artwork>
  </figure>
  <t>
Pnfs_osd_version4 is used to indicate the OSD protocol version or whether
an object is missing (i.e., unavailable).  Some of the object-based layout
supported raid algorithms
encode redundant information and can compensate for missing
components, but the data placement algorithm needs to know
what parts are missing.
  </t>

  <t>
At this time the OSD standard is at version 1.0, and
we anticipate a version 2.0 of the standard
((<xref target="OSD2">SNIA T10/1729-D</xref>)).
The second generation OSD protocol has additional proposed 
features to support more
robust error recovery, snapshots, and byte-range capabilities.
Therefore, the OSD version is explicitly called out
in the information returned in the layout.
(This information can also be deduced by looking inside
the capability type at the format field, which is the first byte.
The format value is 0x1 for an OSD v1 capability.
However, it seems most robust to call out the version explicitly.)
  </t>
 </section> <!-- pnfs_osd_version4 -->

 <section anchor="pnfs_osd_object_cred4" title="pnfs_osd_object_cred4">
  <figure>
   <artwork>
///enum pnfs_osd_cap_key_sec4 {
///    PNFS_OSD_CAP_KEY_SEC_NONE = 0,
///    PNFS_OSD_CAP_KEY_SEC_SSV  = 1
///};
///
///struct pnfs_osd_object_cred4 {
///    pnfs_osd_objid4         oc_object_id;
///    pnfs_osd_version4       oc_osd_version;
///    pnfs_osd_cap_key_sec4   oc_cap_key_sec;
///    opaque                  oc_capability_key<>;
///    opaque                  oc_capability<>;
///};
///
   </artwork>
  </figure>
  <t>
The pnfs_osd_object_cred4 structure is used to identify each component
comprising the file.
The "oc_object_id" identifies the component object,
the "oc_osd_version" represents
the osd protocol version, or whether that component is unavailable, and
the "oc_capability" and "oc_capability_key",
along with the "oda_systemid" from the pnfs_osd_deviceaddr4,
provide the OSD security credentials needed
to access that object. The "oc_cap_key_sec" value denotes the method used to secure
the oc_capability_key
(see <xref target="OSD Security Data Types" /> for more details).
  </t>
  <t>
To comply with the OSD security requirements the capability key SHOULD
be transferred securely to prevent eavesdropping (see <xref target="Security Considerations" />).
Therefore, a client SHOULD either issue the LAYOUTGET or GETDEVICEINFO operations via RPCSEC_GSS
with the privacy service or to previously establish an SSV for the sessions
via the NFSv4.1 SET_SSV operation.
The pnfs_osd_cap_key_sec4 type is used to identify the method used by the
server to secure the capability key.
   <list style='symbols'>
    <t>
PNFS_OSD_CAP_KEY_SEC_NONE denotes that the oc_capability_key is not encrypted
in which case the client SHOULD issue the LAYOUTGET or GETDEVICEINFO operations with
RPCSEC_GSS with the privacy service or the NFSv4.1 transport should
be secured by using methods that are external to NFSv4.1 like the use of
<xref target='RFC4301'>IPSEC</xref> for transporting the NFSV4.1 protocol.
    </t>
    <t>
PNFS_OSD_CAP_KEY_SEC_SSV denotes that the oc_capability_key contents are
encrypted using the SSV GSS context and the capability key as
inputs to the GSS_Wrap() function (see <xref target='GSS-API'>GSS-API</xref>) with
the conf_req_flag set to TRUE.  The client MUST use the secret SSV key
as part of the client's GSS context to decrypt the capability key
using the value of the oc_capability_key field as the input_message to the
GSS_unwrap() function. Note that to prevent eavesdropping of the SSV key
the client SHOULD issue SET_SSV via RPCSEC_GSS with the privacy service.
    </t>
   </list>
  </t>
  <t>
The actual method chosen depends on whether the client established
a SSV key with the server and whether it issued the 
operation with the RPCSEC_GSS privacy method.
Naturally, if the client did not establish a SSV key via SET_SSV
the server MUST use the PNFS_OSD_CAP_KEY_SEC_NONE method.
Otherwise, if the operation was not issued with the RPCSEC_GSS
privacy method the server SHOULD secure the oc_capability_key with the
PNFS_OSD_CAP_KEY_SEC_SSV method.  The server MAY use the
PNFS_OSD_CAP_KEY_SEC_SSV method also when the operation
was issued with the RPCSEC_GSS privacy method.
  </t>
 </section> <!-- pnfs_osd_object_cred4 -->

 <section anchor="pnfs_osd_raid_algorithm4" title="pnfs_osd_raid_algorithm4">
  <figure>
   <artwork>
///enum pnfs_osd_raid_algorithm4 {
///    PNFS_OSD_RAID_0     = 1,
///    PNFS_OSD_RAID_4     = 2,
///    PNFS_OSD_RAID_5     = 3,
///    PNFS_OSD_RAID_PQ    = 4     /* Reed-Solomon P+Q */
///};
///
   </artwork>
  </figure>
  <t>
pnfs_osd_raid_algorithm4 represents the data redundancy
algorithm used to protect the file's contents.
See <xref target="RAID Algorithms" /> for more details.
  </t>
 </section> <!-- pnfs_osd_raid_algorithm4 -->

</section> <!-- Basic Data Type Definitions -->

</section> <!-- xdr_desc -->

<section title="Object Storage Device Addressing and Discovery">
 <t>
Data operations to an OSD require the client to know the "address"
of each OSD's root object. The root object is synonymous with SCSI logical unit.
The client specifies SCSI logical units to its SCSI protocol stack using a
representation local to the client. Because these representations
are local, GETDEVICEINFO must return information that can be used
by the client to select the correct local representation.
 </t>
 <t>
In the block world, a set offset (logical block number or track/sector)
contains a disk label. This label identifies the disk uniquely.
In contrast, an OSD has a standard set of attributes on its root object.
For device identification purposes
the OSD System ID (root information attribute number 3) and the 
OSD Name (root information attribute number 9) are used as the label.
These appear in the pnfs_osd_deviceaddr4 type below under the
"oda_systemid" and "oda_osdname" fields.
 </t>
 <t>
   In some situations, SCSI target discovery may need to be driven based
   on information contained in the GETDEVICEINFO response. One example of
   this is iSCSI targets that are not known to the client until a layout
   has been requested.
   The information provided as the "targetid", "netaddr", and "lun"
   fields in the pnfs_osd_deviceaddr4 type described below
   (see <xref target='pnfs_osd_deviceaddr4' />),
   allows the client to probe a specific device given its network address
   and optionally its iSCSI Name
   (see <xref target='iSCSI'>iSCSI</xref>),
   or when the device network address is omitted,
   to discover the object storage device using the provided
   device name or SCSI device identifier
   (See <xref target='SPC-3'>SPC-3</xref>.)
 </t>
 <t>
   The oda_systemid is implicitly used by the client, by using the object credential
   signing key to sign each request with the request integrity check value.
   This method protects the client from unintentionally accessing a device if
   the device address mapping was changed (or revoked).
   The server computes the capability key using its own view of the systemid
   associated with the respective deviceid present in the credential.  If the
   client's view of the deviceid mapping is stale, the client will use the wrong
   systemid (which must be system-wide unique) and the I/O request to the OSD will
   fail to pass the integrity check verification.
 </t>
 <t>
   To recover from this condition the client should report the error and
   return the layout using LAYOUTRETURN, and invalidate all the device address mappings
   associated with this layout.
   The client can then ask for a new layout if it wishes using LAYOUTGET and
   resolve the referenced deviceids using GETDEVICEINFO or GETDEVICELIST.
 </t>
 <t>
 The server MUST provide the oda_systemid and SHOULD also provide the oda_osdname.
 When the OSD name is present the client SHOULD get the root information
 attributes whenever it establishes communication with the OSD and verify
 that the OSD name it got from the OSD matches the one sent by the metadata server.
 To do so, the client uses the root_obj_cred credentials.
 </t>
 <section anchor="pnfs_osd_targetid_type4" title="pnfs_osd_targetid_type4">
  <t>
   The following enum specifies the manner in which a scsi target can
   be specified. The target can be specified as an SCSI Name, or as a
   SCSI Device Identifier.
  </t>
  <figure>
   <artwork>
///enum pnfs_osd_targetid_type4 {
///    OBJ_TARGET_ANON             = 1,
///    OBJ_TARGET_SCSI_NAME        = 2,
///    OBJ_TARGET_SCSI_DEVICE_ID   = 3
///};
///
   </artwork>
  </figure>
 </section> <!-- pnfs_osd_targetid_type4 -->

 <section anchor="pnfs_osd_deviceaddr4" title="pnfs_osd_deviceaddr4">
  <t>
The specification for an object device address is as follows:
  </t>
  <figure>
   <artwork>
///union pnfs_osd_targetid4 switch (pnfs_osd_targetid_type4 oti_type) {
///    case OBJ_TARGET_SCSI_NAME:
///        string              oti_scsi_name<>;
///
///    case OBJ_TARGET_SCSI_DEVICE_ID:
///        opaque              oti_scsi_device_id<>;
///
///    default:
///        void;
///};
///
///union pnfs_osd_targetaddr4 switch (bool ota_available) {
///    case TRUE:
///        netaddr4            ota_netaddr;
///    case FALSE:
///        void;
///};
///
///struct pnfs_osd_deviceaddr4 {
///    pnfs_osd_targetid4      oda_targetid;
///    pnfs_osd_targetaddr4    oda_targetaddr;
///    uint64_t                oda_lun;
///    opaque                  oda_systemid<>;
///    pnfs_osd_object_cred4   oda_root_obj_cred;
///    opaque                  oda_osdname<>;
///};
///
   </artwork>
  </figure>
  <section title='SCSI Target Identifier'>
   <t>
When "oda_targetid" is specified as a OBJ_TARGET_SCSI_NAME,
the "oti_scsi_name" string MUST be formatted as a "iSCSI Name" as specified in
<xref target='iSCSI'>iSCSI</xref> and <xref target='iscsi-naming-format' />.
Note that the specification of the oti_scsi_name string format is outside
the scope of this document.  Parsing the string is based on the string
prefix, e.g. "iqn.", "eui.", or "naa." and more formats MAY be specified
in the future in accordance with iSCSI Names properties.
   </t>
   <t>
Currently, the iSCSI Name provides for naming the target device using
a string formmatted as an iSCSI Qualified Name (IQN) or as an
<xref target='EUI'>EUI</xref> string.
Those are typically used to identify iSCSI or
<xref target='SRP'>SRP</xref> devices.
The Network Address Authority (NAA) string format
(see <xref target='iscsi-naming-format' />) provides for naming
the device using globally unique identifiers, as defined in
<xref target='FC-FS-2'>FC-FS</xref>.  These are typically used to identify
Fibre Channel or <xref target='SAS'>SAS</xref> (Serial Attached SCSI) devices.
In particular, such devices that are dual-attached both over Fibre Channel or
SAS, and over iSCSI.
   </t>
   <t>
When "oda_targetid" is specified as a OBJ_TARGET_SCSI_DEVICE_ID,
the "oti_scsi_device_id" opaque field MUST be formatted as a SCSI Device Identifier
as defined in <xref target='SPC-3'>SPC-3</xref> VPD Page 83h
(Section 7.6.3. "Device Identification VPD Page".)
If the Device Identifier is identical to the OSD System ID, as given by
oda_systemid, the server SHOULD provide a zero-length oti_scsi_device_id<>
opaque value 
Note that similarly to the "oti_scsi_name",
the specification of the oti_scsi_device_id opaque contents is outside
the scope of this document and more formats MAY be specified
in the future in accordance with SPC-3.
   </t>
   <t>
The OBJ_TARGET_ANON pnfs_osd_targetid_type4 MAY be used for providing no
target identification.  In this case only the OSD System ID and optionally, the
provided network address, are used to locate to device.
   </t>
  </section> <!-- SCSI Target Identifier -->
  <section title='Device Network Address'>
   <t>
The optional "oda_targetaddr" field MAY be provided by the server as a hint to
accelerate device discovery over e.g., the iSCSI transport protocol.
The network address is given with the netaddr4 type, which
specifies a TCP/IP based endpoint (as specified in <xref target='NFSv4.1'>NFSv4.1 draft</xref>).
When given, the client SHOULD use it to probe for the SCSI device at the
given network address.  The client MAY still use other discovery mechanisms
such as <xref target='iSNS'>iSNS</xref> to locate the device using the oda_targetid.
In particular, such external name service, SHOULD be used when the devices
may be attached to the network using multiple connections, and/or multiple
storage fabrics (e.g. Fibre-Channel and iSCSI.)
   </t>
  </section> <!-- Device Network Address -->
 </section> <!-- pnfs_osd_deviceaddr4 -->
</section> <!-- Object Storage Device Addressing and Discovery -->
<section title="Object-Based Layout">
 <t>
The layout4 type is defined in the
<xref target="NFSv4.1">NFSv4.1 draft</xref> as follows:
 </t>

 <figure>
  <artwork>
enum layouttype4 {
    LAYOUT4_NFSV4_1_FILES   = 1,
    LAYOUT4_OSD2_OBJECTS    = 2,
    LAYOUT4_BLOCK_VOLUME    = 3
};

struct layout_content4 {
    layouttype4             loc_type;
    opaque                  loc_body<>;
};

struct layout4 {
    offset4                 lo_offset;
    length4                 lo_length;
    layoutiomode4           lo_iomode;
    layout_content4         lo_content;
};

  </artwork>
 </figure>

 <t>
This document defines structure associated with
the layouttype4 value, LAYOUT4_OSD2_OBJECTS.
The <xref target="NFSv4.1">NFSv4.1 draft</xref> specifies the loc_body structure as an XDR type "opaque".
The opaque layout is uninterpreted by the generic pNFS client layers, but obviously
must be interpreted by the object-storage layout driver.  This section defines the
structure of this opaque value, pnfs_osd_layout4.
 </t>

 <section anchor="pnfs_osd_data_map4" title="pnfs_osd_data_map4">
  <figure>
   <artwork>
///struct pnfs_osd_data_map4 {
///    uint32_t                    odm_num_comps;
///    length4                     odm_stripe_unit;
///    uint32_t                    odm_group_width;
///    uint32_t                    odm_group_depth;
///    uint32_t                    odm_mirror_cnt;
///    pnfs_osd_raid_algorithm4    odm_raid_algorithm;
///};
///
   </artwork>
  </figure>
  <t>
The pnfs_osd_data_map4 structure parameterizes the algorithm that maps
a file's contents over the component objects.
Instead of limiting the system to simple striping
scheme where loss of a single component object results in
data loss, the map parameters support mirroring
and more complicated schemes that protect against loss
of a component object.
  </t>

  <t>
"odm_num_comps" is the number of component objects the file is striped over.
The server MAY grow the file by adding more components to the stripe
while clients hold valid layouts until the file has reached its final
stripe width.  The file length in this case MUST be limited to
the number of bytes in a full stripe.
  </t>

  <t>
The "odm_stripe_unit" is the number of bytes placed on one
component before advancing to the next one in the list
of components.  The number of bytes in a full stripe
is odm_stripe_unit times the number of components.
In some raid schemes, a stripe includes redundant
information (i.e., parity) that lets the system
recover from loss or damage to a component object.
  </t>

  <t>
The "odm_group_width" and "odm_group_depth" parameters allow
a nested striping pattern (See <xref target="Nested Striping" /> for details).
If there is no nesting,
then odm_group_width and odm_group_depth MUST be zero.
The size of the components array MUST be a multiple
of odm_group_width.
  </t>

  <t>
The "odm_mirror_cnt" is used to replicate a file by replicating its component objects.
If there is no mirroring, then odm_mirror_cnt MUST be 0.
If odm_mirror_cnt is greater than zero, then the size of the component
array MUST be a multiple of (odm_mirror_cnt+1).
  </t>

  <t>
See <xref target="Data Mapping Schemes" /> for more details.
  </t>
 </section> <!-- pnfs_osd_data_map4 -->

 <section anchor="pnfs_osd_layout4" title="pnfs_osd_layout4">
  <figure>
   <artwork>
///struct pnfs_osd_layout4 {
///    pnfs_osd_data_map4      olo_map;
///    uint32_t                olo_comps_index;
///    pnfs_osd_object_cred4   olo_components<>;
///};
///
   </artwork>
  </figure>
  <t>
The pnfs_osd_layout4 structure specifies a layout over a set of 
component objects.
The "olo_components" field is an array of object identifiers
and security credentials that grant
access to each object.
The organization of the data is defined by the pnfs_osd_data_map4 type
that specifies how the file's data is mapped onto the component objects
(i.e., the striping pattern).
The data placement algorithm that maps file data
onto component objects assume that each component object
occurs exactly once in the array of components.
Therefore, component objects MUST appear in the olo_components array
only once.
The components array may represent all objects comprising the file,
in which case "olo_comps_index" is set to zero and the number of entries
in the olo_components array is equal to olo_map.odm_num_comps.
The server MAY return fewer components than odm_num_comps, provided
that the returned components are sufficient to access any byte
in the layout's data range (e.g., a sub-stripe of "odm_group_width" components).
In this case, olo_comps_index represents the position of the returned
components array within the full array of components that comprise
the file.
  </t>

  <t>
Note that the layout depends on the file size, which the client learns
from the generic return parameters of LAYOUTGET,
by doing GETATTR commands to the metadata server.
The client uses the file size
to decide if it
should fill holes with zeros, or return a short read.
Striping patterns can cause cases where component objects are shorter
than other components because a hole happens to correspond to the
last part of the component object.
  </t>
 </section> <!-- pnfs_osd_layout4 -->

 <section anchor="Data Mapping Schemes" title="Data Mapping Schemes">
  <t>
This section describes the different data mapping schemes
in detail.
The object layout always uses a "dense"
layout as described in
<xref target="NFSv4.1">NFSv4.1 draft</xref>.
This means that the second stripe unit of the
file starts at offset 0 of the second component,
rather than at offset stripe_unit bytes.
After a full stripe has been written, the
next stripe unit is appended to the first component
object in the list without any holes in the component objects.
  </t>

  <section anchor="Simple Striping" title="Simple Striping">

   <t>
The mapping from the logical
offset within a file (L) to the component object C and
object-specific offset O is defined by the following equations:
   </t>
   <figure>
    <artwork>
L = logical offset into the file
W = total number of components
S = W * stripe_unit
N = L / S
C = (L-(N*S)) / stripe_unit
O = (N*stripe_unit)+(L%stripe_unit)
    </artwork>
   </figure>
   <t>
In these equations, S is the number of bytes in a full stripe,
and N is the stripe number.  C is an index into the array of components,
so it selects a particular object storage device.
Both N and C count from zero.
O is the offset within the object that corresponds to the file offset.
Note that this computation does not accommodate the same
object appearing in the olo_components array multiple times.
   </t>
   <t>
For example, consider an object striped over four devices, <D0 D1 D2 D3>.
The stripe_unit is 4096 bytes. The stripe width S is thus 4 * 4096 = 16384.
   </t>
   <figure>
    <artwork>
Offset 0:
  N = 0 / 16384 = 0
  C = 0-0/4096 = 0 (D0)
  O = 0*4096 + (0%4096) = 0

Offset 4096:
  N = 4096 / 16384 = 0
  C = (4096-(0*16384)) / 4096 = 1 (D1)
  O = (0*4096)+(4096%4096) = 0

Offset 9000:
  N = 9000 / 16384 = 0
  C = (9000-(0*16384)) / 4096 = 2 (D2)
  O = (0*4096)+(9000%4096) = 808

Offset 132000:
  N = 132000 / 16384 = 8
  C = (132000-(8*16384)) / 4096 = 0 (D0)
  O = (8*4096) + (132000%4096) = 33696
    </artwork>
   </figure>
  </section> <!-- Simple Striping -->

  <section anchor="Nested Striping" title="Nested Striping">
   <t>
The odm_group_width and odm_group_depth parameters allow
a nested striping pattern.
odm_group_width defines the width
of a data stripe and odm_group_depth defines
how many stripes are written before advancing
to the next group of components in the
list of component objects for the file.
The math used to map from a file offset to
a component object and offset within that
object is shown below.
The computations
map from the logical offset L to the component index C and offset relative O
within that component object.
   </t>
   <figure>
    <artwork>
L = logical offset into the file
W = total number of components
S = stripe_unit * group_depth * W
T = stripe_unit * group_depth * group_width
U = stripe_unit * group_width
M = L / S
G = (L - (M * S)) / T
H = (L - (M * S)) % T
N = H / U
C = (H - (N * U)) / stripe_unit + G * group_width
O = L % stripe_unit + N * stripe_unit + M * group_depth * stripe_unit
    </artwork>
   </figure>
   <t>
In these equations, S is the number of bytes striped across all
component objects before the pattern repeats.  T is the number of bytes
striped within a group of component objects before advancing to the next group.
U is the number of bytes in a stripe within a group.
M is the "major" (i.e., across all components) stripe number,
and N is the "minor" (i.e., across the group) stripe number.
G counts the groups from the beginning of the major stripe,
and H is the byte offset within the group.
   </t>
   <t>
For example, consider an object striped over 100 devices with
a group_width of 10, a group_depth of 50, and a stripe_unit of 1 MB.
In this scheme, 500 MB are written to the first 10 components,
and 5000 MB is written before the pattern wraps back around to the
first component in the array.
   </t>
   <figure>
    <artwork>
Offset 0:
  W = 100
  S = 1 MB * 50 * 100 = 5000 MB
  T = 1 MB * 50 * 10 = 500 MB
  U = 1 MB * 10 = 10 MB
  M = 0 / 5000 MB = 0
  G = (0 - (0 * 5000 MB)) / 500 MB = 0
  H = (0 - (0 * 5000 MB)) % 500 MB = 0
  N = 0 / 10 MB = 0
  C = (0 - (0 * 10 MB)) / 1 MB + 0 * 10 = 0
  O = 0 % 1 MB + 0 * 1 MB + 0 * 50 * 1 MB = 0

Offset 27 MB:
  M = 27 MB / 5000 MB = 0
  G = (27 MB - (0 * 5000 MB)) / 500 MB = 0
  H = (27 MB - (0 * 5000 MB)) % 500 MB = 27 MB
  N = 27 MB / 10 MB = 2
  C = (27 MB - (2 * 10 MB)) / 1 MB + 0 * 10 = 7
  O = 27 MB % 1 MB + 2 * 1 MB + 0 * 50 * 1 MB = 2 MB

Offset 7232 MB:
  M = 7232 MB / 5000 MB = 1
  G = (7232 MB - (1 * 5000 MB)) / 500 MB = 4
  H = (7232 MB - (1 * 5000 MB)) % 500 MB = 232 MB
  N = 232 MB / 10 MB = 23
  C = (232 MB - (23 * 10 MB)) / 1 MB + 4 * 10 = 42
  O = 7232 MB % 1 MB + 23 * 1 MB + 1 * 50 * 1 MB = 73 MB
    </artwork>
   </figure>
  </section> <!-- Nested Striping -->

  <section anchor="Mirroring" title="Mirroring">
   <t>
The odm_mirror_cnt is used to replicate a file by replicating its component objects.
If there is no mirroring, then odm_mirror_cnt MUST be 0.
If odm_mirror_cnt is greater than zero, then the size of the olo_components
array MUST be a multiple of (odm_mirror_cnt+1).
Thus, for a classic mirror on two objects, odm_mirror_cnt is one.
Note that mirroring can be defined over any raid algorithm and striping
pattern (either simple or nested).
If odm_group_width is also non-zero, then the size of the olo_components array MUST be a
multiple of odm_group_width * (odm_mirror_cnt+1).
Replicas are adjacent in the olo_components array,
and the value C produced by the above equations is not
a direct index into the olo_components array.  Instead,
the following equations determine the replica component index RCi,
where i ranges from 0 to odm_mirror_cnt.
   </t>
   <figure>
    <artwork>
C = component index for striping or two-level striping
i ranges from 0 to odm_mirror_cnt, inclusive
RCi = C * (odm_mirror_cnt+1) + i
    </artwork>
   </figure>
  </section> <!-- Mirroring -->
 </section> <!-- Data Mapping Schemes -->

 <section anchor="RAID Algorithms" title="RAID Algorithms">
  <t>
pnfs_osd_raid_algorithm4 determines the algorithm and placement of
redundant data.
This section defines the different RAID algorithms.
  </t>

  <section anchor="PNFS_OSD_RAID_0" title="PNFS_OSD_RAID_0">
   <t>
PNFS_OSD_RAID_0 means there is no
parity data, so all bytes in the component objects are
data bytes located by the above equations for C and O.
If a component object is marked as PNFS_OSD_MISSING,
the pNFS client MUST either return an I/O error if this component
is attempted to be read or alternatively, it can
retry the READ against the pNFS server.
   </t>
  </section> <!-- PNFS_OSD_RAID_0 -->

  <section anchor="PNFS_OSD_RAID_4" title="PNFS_OSD_RAID_4">
   <t>
PNFS_OSD_RAID_4 means that the last component object,
or the last in each group (if odm_group_width is greater than zero),
contains parity information computed over the rest of
the stripe with an XOR operation.
If a component object is unavailable, the client can
read the rest of the stripe units in the damaged stripe
and recompute the missing stripe unit by XORing the other
stripe units in the stripe.  Or the client can replay
the READ against the pNFS server which will presumably
perform the reconstructed read on the client's behalf.
   </t>
   <t>
When parity is present in the file,
then there is an additional computation to map from the file offset L
to the offset that accounts for embedded parity, L'.
First compute L', and then use L' in the above equations for C and O.
   </t>
   <figure>
    <artwork>
L = file offset, not accounting for parity
P = number of parity devices in each stripe
W = group_width, if not zero, else size of olo_components array
N = L / (W-P * stripe_unit)
L' = N * (W * stripe_unit) +
     (L % (W-P * stripe_unit))
    </artwork>
   </figure>
  </section> <!-- PNFS_OSD_RAID_4 -->

  <section anchor="PNFS_OSD_RAID_5" title="PNFS_OSD_RAID_5">
   <t>
PNFS_OSD_RAID_5 means that the position of the parity data
is rotated on each stripe or each group (if odm_group_width is greater than zero).
In the first stripe, the last
component holds the parity.  In the second stripe, the
next-to-last component holds the parity, and so on.
In this scheme, all stripe units are rotated so that I/O
is evenly spread across objects as the file is read
sequentially.  The rotated parity layout is illustrated here,
with numbers indicating the stripe unit.
   </t>
   <figure>
    <artwork>
0 1 2 P
4 5 P 3
8 P 6 7
P 9 a b
    </artwork>
   </figure>
   <t>
To compute the component object C, first compute the
offset that accounts for parity L' and use that to
compute C.  Then rotate C to get C'.
Finally, increase C' by one if the parity
information comes at or before C' within that stripe.
The following equations illustrate this by computing I,
which is the index of the component that contains
parity for a given stripe.
   </t>
   <figure>
    <artwork><![CDATA[
L = file offset, not accounting for parity
W = odm_group_width, if not zero, else size of olo_components array
N = L / (W-1 * stripe_unit)
(Compute L' as describe above)
(Compute C based on L' as described above)
C' = (C - (N%W)) % W
I = W - (N%W) - 1
if (C' <= I) {
  C'++
}
]]></artwork>
   </figure>
  </section> <!-- PNFS_OSD_RAID_5 -->

  <section anchor="PNFS_OSD_RAID_PQ" title="PNFS_OSD_RAID_PQ">
   <t>
PNFS_OSD_RAID_PQ is a double-parity scheme that uses
the Reed-Solomon P+Q encoding scheme <xref target='Error Correcting Codes' />.
In this layout,
the last two component objects hold the P and Q data,
respectively.  P is parity computed with XOR, and
Q is a more complex equation that is not described here.
The equations given above for embedded parity can be
used to map a file offset to the correct component
object by setting the number of parity components to 2
instead of 1 for RAID4 or RAID5.
Clients may simply choose to read data through the
metadata server if two components are missing or
damaged.
   </t>
   <t>
Issue: This scheme also has a RAID_4 like layout where
the ECC blocks are stored on the same components on every stripe
and a rotated, RAID-5 like layout where the stripe units are rotated.
Should we make the following properties orthogonal: RAID_4 or RAID_5 (i.e.,
non-rotated or rotated), and then have the number of parity components
and the associated algorithm be the orthogonal parameter?
   </t>
  </section> <!-- PNFS_OSD_RAID_PQ -->

  <section title="RAID Usage and Implementation Notes">
   <t>
RAID layouts with redundant data in their stripes
require additional serialization of updates to
ensure correct operation. Otherwise, if two clients simultaneously
write to the same logical range of an object, the result could include
different data in the same ranges of mirrored tuples, or corrupt parity
information.  It is the
responsibility of the metadata server to enforce serialization
requirements such as this. For example, the metadata server may do
so by not granting overlapping write layouts within mirrored objects.
   </t>
  </section> <!-- RAID Usage and Implementation Notes -->
 </section> <!-- RAID Algorithms -->
</section> <!-- Object-Based Layout -->

<section title="Object-Based Layout Update">
 <t>
layoutupdate4 is used in the LAYOUTCOMMIT operation
to convey updates to the layout and additional information to
the metadata server.  It is defined in the
<xref target="NFSv4.1">NFSv4.1 draft</xref> as follows:
 </t>

 <figure>
  <artwork>
struct layoutupdate4 {
    layouttype4             lou_type;
    opaque                  lou_body<>;
};
  </artwork>
 </figure>

 <t>
The layoutupdate4 type is an opaque value at the generic pNFS client level.
If the lou_type layout type is LAYOUT4_OSD2_OBJECTS, then
the lou_body opaque value is defined by the pnfs_osd_layoutupdate4 type.
 </t>

 <t>
Object-Based pNFS clients are not allowed to modify the layout.
Therefore, the information passed in pnfs_osd_layoutupdate4
is used only to update the file's attributes.
In addition to the generic information the client can pass to the metadata
server in LAYOUTCOMMIT such as the highest offset the client wrote to and
the last time it modified the file, the client MAY use pnfs_osd_layoutupdate4
to convey the capacity consumed (or released) by writes using the layout,
and to indicate that I/O errors were encountered by such writes.
 </t>

 <section anchor="pnfs_osd_deltaspaceused4" title="pnfs_osd_deltaspaceused4">
  <figure>
   <artwork>
///union pnfs_osd_deltaspaceused4 switch (bool dsu_valid) {
///    case TRUE:
///        int64_t     dsu_delta;
///    case FALSE:
///        void;
///};
///
   </artwork>
  </figure>
  <t>
pnfs_osd_deltaspaceused4 is used to convey space utilization information
at the time of LAYOUTCOMMIT.  For the file system to properly maintain
capacity used information, it needs to track how much capacity was
consumed by WRITE operations performed by the client.  In this protocol,
the OSD returns the capacity consumed by a write (*), which can be
different than the number of bytes written
because of internal overhead like block-level allocation and
indirect blocks, and the client reflects this back to the pNFS server
so it can accurately track quota.  The pNFS server can choose to
trust this information coming from the clients and therefore
avoid querying the OSDs at the time of LAYOUTCOMMIT.
If the client is unable to obtain this information from the OSD,
it simply returns invalid olu_delta_space_used.
  </t>
  <t>
(*) Note: At the time this document is written,
a per-command used capacity attribute is not yet standardized
by <xref target="OSD2">OSD2 draft</xref>.
The client MAY use vendor-specific attributes to calculate space utilization,
provided that the vendor defines and publishes a suitable vendor-specific
attributes page for current-command attributes as defined by
<xref target="OSD2">OSD2 draft</xref>, Section 7.1.2.2.
  </t>
 </section> <!-- pnfs_osd_deltaspaceused4 -->

 <section anchor="pnfs_osd_layoutupdate4" title="pnfs_osd_layoutupdate4">
  <figure>
   <artwork>
///struct pnfs_osd_layoutupdate4 {
///    pnfs_osd_deltaspaceused4    olu_delta_space_used;
///    bool                        olu_ioerr_flag;
///};
///
   </artwork>
  </figure>

  <t>
"olu_delta_space_used" is used to convey capacity usage
information back to the metadata server.
  </t>

  <t>
The "olu_ioerr_flag" is used when I/O errors were encountered while writing
the file.  The client MUST report the errors using the pnfs_osd_ioerr4
structure (See <xref target="pnfs_osd_errno4" />) at LAYOUTRETURN time.
  </t>
  <t>
If the client updated the file successfully before hitting the
I/O errors it MAY use LAYOUTCOMMIT to update the metadata server as
described above.  Typically, in the error-free case, the server MAY
turn around and update the file's attributes on the storage devices.
However, if I/O errors were encountered the server better not attempt
to write the new attributes on the storage devices until it receives
the I/O error report, therefore the client MUST set the olu_ioerr_flag
to true.
Note that in this case, the client SHOULD send both the
LAYOUTCOMMIT and LAYOUTRETURN operations in the same COMPOUND RPC.
  </t>
 </section> <!-- pnfs_osd_layoutupdate4 -->
</section> <!-- Object-Based Layout Update -->

<section title="Recovering from Client I/O Errors">
 <t>
The pNFS client may encounter errors when directly accessing
the object storage devices.
However, it is the responsibility of the metadata server to
handle the I/O errors.
When the LAYOUT4_OSD2_OBJECTS layout type is used, the client
MUST report the I/O errors to the server at LAYOUTRETURN time
using the pnfs_osd_ioerr4 structure (See <xref target="pnfs_osd_errno4" />).
 </t>
 <t>
The metadata server analyzes the error and determines the required
recovery operations such as repairing any parity inconsistencies,
recovering media failures, or reconstructing missing objects.
 </t>
 <t>
The metadata server SHOULD recall any outstanding layouts to allow it
exclusive write access to the stripes being recovered and to prevent other
clients from hitting the same error condition.
In these cases, the server MUST complete recovery before handing out
any new layouts to the affected byte ranges.
 </t>
 <t>
Although is it MAY be acceptable for the client to propagate a
corresponding error to the application that initiated the I/O operation
and drop any unwritten data, the client SHOULD attempt to retry the original
I/O operation by requesting a new layout using LAYOUTGET and retry the
I/O operation(s) using the new layout or the client MAY just retry the
I/O operation(s) using regular NFS READ or WRITE operations via the metadata
server.  The client SHOULD attempt to retrieve a new layout and retry the I/O
operation using OSD commands first and only if the error persists, retry
the I/O operation via the metadata server.
 </t>
</section> <!-- Recovering from Client I/O Errors -->

<section title="Object-Based Layout Return">
 <t>
layoutreturn_file4 is used in the LAYOUTRETURN operation
to convey layout-type specific information to the server.
It is defined in the
<xref target="NFSv4.1">NFSv4.1 draft</xref> as follows:
 </t>

 <figure>
  <artwork>
struct layoutreturn_file4 {
        offset4         lrf_offset;
        length4         lrf_length;
        stateid4        lrf_stateid;
        /* layouttype4 specific data */
        opaque          lrf_body<>;
};

union layoutreturn4 switch(layoutreturn_type4 lr_returntype) {
        case LAYOUTRETURN4_FILE:
                layoutreturn_file4      lr_layout;
        default:
                void;
};

struct LAYOUTRETURN4args {
        /* CURRENT_FH: file */
        bool                    lora_reclaim;
        layoutreturn_stateid    lora_recallstateid;
        layouttype4             lora_layout_type;
        layoutiomode4           lora_iomode;
        layoutreturn4           lora_layoutreturn;
};

  </artwork>
 </figure>

 <t>
If the lora_layout_type layout type is LAYOUT4_OSD2_OBJECTS, then
the lrf_body opaque value is defined by the pnfs_osd_layoutreturn4 type.
 </t>
 <t>
The pnfs_osd_layoutreturn4 type allows the client to report I/O error information
back to the metadata server as defined below.
 </t>

 <section anchor="pnfs_osd_errno4" title="pnfs_osd_errno4">
  <figure>
   <artwork>
///enum pnfs_osd_errno4 {
///    PNFS_OSD_ERR_EIO            = 1,
///    PNFS_OSD_ERR_NOT_FOUND      = 2,
///    PNFS_OSD_ERR_NO_SPACE       = 3,
///    PNFS_OSD_ERR_BAD_CRED       = 4,
///    PNFS_OSD_ERR_NO_ACCESS      = 5,
///    PNFS_OSD_ERR_UNREACHABLE    = 6,
///    PNFS_OSD_ERR_RESOURCE       = 7
///};
///
   </artwork>
  </figure>

  <t>
pnfs_osd_errno4 is used to represent error types when read/write errors
are reported to the metadata server.
The error codes serve as hints to the metadata server that may help it
in diagnosing the exact reason for the error and in repairing it.

  <list style="symbols">
   <t>
PNFS_OSD_ERR_EIO indicates the operation failed because the
Object Storage Device experienced a failure trying to access the
object.  The most common source of these errors is media errors,
but other internal errors might cause this. In this case,
the metadata server should go examine the broken object 
more closely, hence it should be used as the default error code.
   </t>
   <t>
PNFS_OSD_ERR_NOT_FOUND indicates the object ID specifies an object that
does not exist on the Object Storage Device.
   </t>
   <t>
PNFS_OSD_ERR_NO_SPACE indicates the operation failed because the
Object Storage Device ran out of free capacity during the operation.
   </t>
   <t>
PNFS_OSD_ERR_BAD_CRED indicates the security parameters are not valid.
The primary cause of this is that the capability has expired,
or the access policy tag (a.k.a, capability version number) has
been changed to revoke capabilities.  The client will need to
return the layout and get a new one with fresh capabilities.
   </t>
   <t>
PNFS_OSD_ERR_NO_ACCESS indicates the capability does not allow
the requested operation.  This should not occur in normal operation
because the metadata server should give out correct capabilities,
or none at all.
   </t>
   <t>
PNFS_OSD_ERR_UNREACHABLE indicates the client did not complete
the I/O operation at the Object Storage Device
due to a communication failure.  Whether the I/O operation
was executed by the OSD or not is undetermined.
   </t>
   <t>
PNFS_OSD_ERR_RESOURCE indicates the client did not issue
the I/O operation due to a local problem on the initiator (i.e. client)
side, e.g., when running out of memory.  The client
MUST guarantee that the OSD command was never dispatched to the OSD.
   </t>
  </list>
  </t>
 </section> <!-- pnfs_osd_errno4 -->

 <section anchor="pnfs_osd_ioerr4" title="pnfs_osd_ioerr4">
  <figure>
   <artwork>
///struct pnfs_osd_ioerr4 {
///    pnfs_osd_objid4     oer_component;
///    length4             oer_comp_offset;
///    length4             oer_comp_length;
///    bool                oer_iswrite;
///    pnfs_osd_errno4     oer_errno;
///};
///
   </artwork>
  </figure>
  <t>
The pnfs_osd_ioerr4 structure is used to return error indications
for objects that generated errors during data transfers.
These are hints to the
metadata server that there are problems with that object.
For each error, "oer_component", "oer_comp_offset", and "oer_comp_length"
represent the object
and byte range within the component object in which the error occurred,
"oer_iswrite" is set to "true" if the failed OSD operation was data modifying, and
"oer_errno" represents the type of error.
  </t>
  <t>
Component byte ranges in the optional pnfs_osd_ioerr4 structure are
used for recovering the object and MUST be set by the client to cover all
failed I/O operations to the component.
  </t>
 </section> <!-- pnfs_osd_ioerr4 -->

 <section anchor="pnfs_osd_layoutreturn4" title="pnfs_osd_layoutreturn4">
  <figure>
   <artwork>
///struct pnfs_osd_layoutreturn4 {
///    pnfs_osd_ioerr4             olr_ioerr_report<>;
///};
///
   </artwork>
  </figure>

  <t>
When OSD I/O operations failed, "olr_ioerr_report<>" is used to report these errors
to the metadata server as an array of elements of type pnfs_osd_ioerr4.
Each element in the array represents an error that occured on
the object specified by oer_component.
If no errors are to be reported, the size of the olr_ioerr_report<> array
is set to zero.
  </t>

 </section> <!-- pnfs_osd_layoutreturn4 -->
</section> <!-- Object-Based Layout Return -->

<section title="Object-Based Creation Layout Hint">
 <t>
The layouthint4 type is defined in the 
<xref target="NFSv4.1">NFSv4.1 draft</xref> as follows:
 </t>

 <figure>
  <artwork>
struct layouthint4 {
    layouttype4           loh_type;
    opaque                loh_body<>;
};
  </artwork>
 </figure>

 <t>
The layouthint4 structure is used by the client to pass in a
hint about the type of layout it would like created for a particular
file.
If the loh_type layout type is LAYOUT4_OSD2_OBJECTS, then
the loh_body opaque value is defined by the pnfs_osd_layouthint4 type.
 </t>

 <section anchor="pnfs_osd_layouthint4" title="pnfs_osd_layouthint4">

  <figure>
   <artwork>
///union pnfs_osd_max_comps_hint4 switch (bool omx_valid) {
///    case TRUE:
///        uint32_t            omx_max_comps;
///    case FALSE:
///        void;
///};
///
///union pnfs_osd_stripe_unit_hint4 switch (bool osu_valid) {
///    case TRUE:
///        length4             osu_stripe_unit;
///    case FALSE:
///        void;
///};
///
///union pnfs_osd_group_width_hint4 switch (bool ogw_valid) {
///    case TRUE:
///        uint32_t            ogw_group_width;
///    case FALSE:
///        void;
///};
///
///union pnfs_osd_group_depth_hint4 switch (bool ogd_valid) {
///    case TRUE:
///        uint32_t            ogd_group_depth;
///    case FALSE:
///        void;
///};
///
///union pnfs_osd_mirror_cnt_hint4 switch (bool omc_valid) {
///    case TRUE:
///        uint32_t            omc_mirror_cnt;
///    case FALSE:
///        void;
///};
///
///union pnfs_osd_raid_algorithm_hint4 switch (bool ora_valid) {
///    case TRUE:
///        pnfs_osd_raid_algorithm4    ora_raid_algorithm;
///    case FALSE:
///        void;
///};
///
///struct pnfs_osd_layouthint4 {
///    pnfs_osd_max_comps_hint4        olh_max_comps_hint;
///    pnfs_osd_stripe_unit_hint4      olh_stripe_unit_hint;
///    pnfs_osd_group_width_hint4      olh_group_width_hint;
///    pnfs_osd_group_depth_hint4      olh_group_depth_hint;
///    pnfs_osd_mirror_cnt_hint4       olh_mirror_cnt_hint;
///    pnfs_osd_raid_algorithm_hint4   olh_raid_algorithm_hint;
///};
///
   </artwork>
  </figure>

  <t>
This type conveys hints for the desired data map.
All parameters are optional so the client can give values for only
the parameters it cares about, e.g. it can provide a hint for the desired
number of mirrored components, regardless of the the raid algorithm selected
for the file.  The server should make an attempt to honor the hints 
but it can ignore any or all of them at its own discretion and
without failing the respective CREATE operation.
  </t>

  <t>
The "olh_max_comps_hint" can be used to limit the total number of component
objects comprising the file.  All other hints correspond directly to the
different fields of pnfs_osd_data_map4.
  </t>

 </section>
</section>

<section title="Layout Segments">
 <t>
The pnfs layout operations operate on logical byte ranges.
There is no requirement in the protocol for any relationship between
byte ranges used in LAYOUTGET to acquire layouts and byte ranges used
in CB_LAYOUTRECALL, LAYOUTCOMMIT, or LAYOUTRETURN.
However, using OSD byte-range capabilities poses limitations on these operations
since the capabilities associated with layout segments cannot be merged
or split.
The following guidelines should be followed for proper operation of
object-based layouts.
 </t>

 <section title="CB_LAYOUTRECALL and LAYOUTRETURN">

  <t>
In general, the object-based layout driver should keep track of each
layout segment it got, keeping record of the segment's iomode, offset,
and length.
The server should allow the client to get multiple overlapping layout
segments but is free to recall the layout to prevent overlap.
  </t>

  <t>
In response to CB_LAYOUTRECALL, the client should return all layout
segments matching the given iomode and overlapping with the recalled range.
When returning the layouts for this byte range with LAYOUTRETURN the client
MUST NOT return a sub-range of a layout segment it has;
each LAYOUTRETURN sent MUST completely cover at least one outstanding layout
segment.
  </t>

  <t>
The server, in turn, should release any segment that exactly matches the
clientid, iomode, and byte range given in LAYOUTRETURN.
If no exact match is found then the server should release all layout segments
matching the clientid and iomode and that are fully contained in the
returned byte range.
If none are found and the byte range is a subset of an outstanding layout
segment with for the same clientid and iomode, then the client can be
considered malfunctioning and the server SHOULD recall all layouts from
this client to reset its state.  If this behavior repeats the server
SHOULD deny all LAYOUTGETs from this client.
  </t>
 </section>

 <section title="LAYOUTCOMMIT">
  <t>
LAYOUTCOMMIT is only used by object-based pNFS to convey modified attributes
hints and/or to report I/O errors to the MDS.
Therefore, the offset and length in LAYOUTCOMMIT4args are reserved for future
use and should be set to 0.
  </t>
 </section>
</section>

<section title="Recalling Layouts">
 <t>
The object-based metadata server should recall outstanding layouts
in the following cases:

 <list style='symbols'>
  <t>
  When the file's security policy changes, i.e. ACLs or permission mode bits
are set.
  </t>
  <t>
  When the file's aggregation map changes, rendering outstanding layouts invalid.
  </t>
  <t>
  When there are sharing conflicts. For example, the server will issue
stripe aligned layout segments for RAID-5 objects.  To prevent corruption
of the file's parity, Multiple clients must not hold valid write layouts
for the same stripes.
An outstanding RW layout should be recalled when a conflicting LAYOUTGET
is received from a different client for LAYOUTIOMODE4_RW and for a byte-range
overlapping with the outstanding layout segment.
  </t>
 </list>
 </t>

 <section title="CB_RECALL_ANY" anchor="CB_RECALL_ANY">
  <t>
The metadata server can use the CB_RECALL_ANY callback operation to notify
the client to return some or all of its layouts.
The <xref target="NFSv4.1">NFSv4.1 draft</xref> defines
the following types:
  </t>

  <figure>
   <artwork>
const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN     = 8;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX     = 9;

struct  CB_RECALL_ANY4args      {
    uint32_t        craa_objects_to_keep;
    bitmap4         craa_type_mask;
};
   </artwork>
  </figure>

  <t>
Typically, CB_RECALL_ANY will be used to recall client state when the server
needs to reclaim resources. The craa_type_mask bitmap specifies the type of
resources that are recalled and the craa_objects_to_keep value specifies
how many of the recalled objects the client is allowed to keep.

The object-based layout type mask flags are defined as follows.
They represent the iomode of the recalled layouts.
In response, the client SHOULD return layouts of the recalled iomode
that it needs the least,
keeping at most craa_objects_to_keep object-based layouts.
  </t>
  <figure>
   <artwork>
///enum pnfs_osd_cb_recall_any_mask {
///    PNFS_OSD_RCA4_TYPE_MASK_READ = 8,
///    PNFS_OSD_RCA4_TYPE_MASK_RW   = 9
///};
///
   </artwork>
  </figure>

  <t>
The PNFS_OSD_RCA4_TYPE_MASK_READ flag notifies the client to return layouts
of iomode LAYOUTIOMODE4_READ.
Similarly, the PNFS_OSD_RCA4_TYPE_MASK_RW flag notifies the client to return layouts
of iomode LAYOUTIOMODE4_RW.
When both mask flags are set, the client is notified to return layouts
of either iomode.
  </t>
 </section>
</section>

<section title="Client Fencing" anchor="Client Fencing">
 <t>
In cases where clients are uncommunicative and their lease has expired
or when clients fail to return recalled layouts in a timely manner the
server MAY revoke client layouts and/or device address mappings and reassign
these resources to other clients.
To avoid data corruption, the metadata server MUST fence off the revoked
clients from the respective objects as described in <xref target="Revoking Capabilities" />.
 </t>
</section>

<section title="Security Considerations" anchor="Security Considerations">
 <t>
  The pNFS extension partitions the NFSv4 file system protocol into
  two parts, the control path and the data path (storage protocol).
  The control path contains all the new operations described by this
  extension; all existing NFSv4 security mechanisms and features apply
  to the control path.  The combination of components in a pNFS system
  is required to preserve the
  security properties of NFSv4 with respect to an entity accessing
  data via a client, including security countermeasures to defend
  against threats that NFSv4 provides defenses for in environments
  where these threats are considered significant.
 </t>
 <t>
  The metadata server enforces the file access-control policy at LAYOUTGET time.
  The client should use suitable authorization credentials for getting the
  layout for the requested iomode (READ or RW) and the server verifies the
  permissions and ACL for these credentials, possibly returning NFS4ERR_ACCESS
  if the client is not allowed the requested iomode.  If the LAYOUTGET
  operation succeeds the client receives, as part of the layout, a set of
  object capabilities allowing it I/O access to the specified objects
  corresponding to the requested iomode.  When the client acts on I/O operations
  on behalf of its local users it MUST authenticate and authorize the user by
  issuing respective OPEN and ACCESS calls to the metadata server, similarly
  to having NFSv4 data delegations.  If access is allowed the client uses the
  corresponding (READ or RW) capabilities to perform the I/O operations at the
  object-storage devices.  When the metadata server receives a request to change
  file's permissions or ACL it SHOULD recall all layouts for that file
  and it MUST change the capability version attribute on all objects comprising
  the file to implicitly invalidate any outstanding capabilities before
  committing to the new permissions and ACL.  Doing this will ensure that
  clients re-authorize their layouts according to the modified permissions and
  ACL by requesting new layouts.  Recalling the layouts in this case is courtesy
  of the server intended to prevent clients from getting an error on I/Os done
  after the capability version changed.
 </t>
 <t>
   The object storage protocol MUST implement the security aspects
   described in version 1 of the T10
   <xref target="osd standard">OSD protocol definition</xref>.
  The standard defines four security methods: NOSEC, CAPKEY, CMDRSP,
  and ALLDATA.  To provide minimum level of security allowing verification
  and enforcement of the server access control policy using the layout
  security credentials, the NOSEC security method MUST NOT be used for
  any I/O operation.

   The remainder of this section gives an overview of the security mechanism
   described in that standard.  The goal is to give the reader a basic
   understanding of the object security model.  Any discrepancies
   between this text and the actual standard are obviously to be
   resolved in favor of the OSD standard.
 </t>

 <section anchor="OSD Security Data Types" title="OSD Security Data Types">
  <t>
There are three main data types associated with object security:
a capability, a credential, and security parameters.
The capability is a set of fields that specifies an object
and what operations can be performed on it.
A credential is a signed capability.  Only a security manager
that knows the secret device keys can correctly sign a capability
to form a valid credential.
In pNFS, the file server acts as the security manager and
returns signed capabilities (i.e., credentials) to the pNFS client.
The security parameters are values computed by the issuer of OSD
commands (i.e., the client) that prove they hold valid credentials.
The client uses the credential as a signing key to sign the
requests it makes to OSD, and puts the resulting signatures
into the security_parameters field of the OSD command.
The object storage device uses the secret keys it shares with
the security manager to validate the signature values in
the security parameters.
  </t>
  <t>
The security types are opaque to the generic layers of the
pNFS client.
The credential contents are defined as opaque within the pnfs_osd_object_cred4 
type.
Instead of repeating the definitions here,
the reader is referred to section 4.9.2.2 of the OSD standard.
  </t>
 </section>
 <section title="The OSD Security Protocol">
  <t>
  The object storage protocol relies on a cryptographically secure
  capability to control accesses at the object storage devices.
  Capabilities are generated by the metadata server, returned to the
  client, and used by the client as described below to authenticate
  their requests to the Object Storage Device (OSD).  Capabilities
  therefore achieve the required access and open mode checking.  They
  allow the file server to define and check a policy (e.g., open mode)
  and the OSD to enforce that policy without knowing
  the details (e.g., user IDs and ACLs).
  </t>
  <t>
  Since capabilities are tied to layouts, and since they are used to
  enforce access control, when the file ACL or mode changes the outstanding
  capabilities MUST be revoked to enforce the new access permissions.
  The server SHOULD recall layouts to allow clients to gracefully
  return their capabilities before the access permissions change.
  </t>
  <t>
  Each capability is specific to a particular object, an operation
  on that object, a byte range within the object (in OSDv2), and has an explicit
  expiration time.  The capabilities are signed with a secret key
  that is shared by the object storage devices (OSD) and the metadata
  managers.  Clients do not have device keys so they are unable to
  forge the signatures in the security parameters.
  The combination of a capability, the OSD system id, and a signature is called
  a "credential" in the OSD specification.
  </t>
  <t>
  The details of the security and privacy model for Object Storage
  are defined in the T10 OSD standard.
  The following sketch of the algorithm should help the
  reader understand the basic model.
  </t>
  <t>
  LAYOUTGET returns a CapKey and a Cap which, together with the OSD SystemID,
  are also called a credential.
  It is a capability and a signature over that capability and the SystemID.
  The OSD Standard refers to the CapKey as the "Credential integrity
  check value" and to the ReqMAC as the "Request integrity check value".
   <figure>
    <artwork><![CDATA[
CapKey = MAC<SecretKey>(Cap, SystemID)
Credential = {Cap, SystemID, CapKey}
]]></artwork>
   </figure>
  The client uses CapKey to sign all the requests it issues
  for that object using the respective Cap.  In other words,
  the Cap appears in the request to the storage device, and
  that request is signed with the CapKey as follows:
   <figure>
    <artwork><![CDATA[
ReqMAC = MAC<CapKey>(Req, ReqNonce)
Request = {Cap, Req, ReqNonce, ReqMAC}
]]></artwork>
   </figure>
  The following is sent to the OSD: {Cap, Req, ReqNonce, ReqMAC}.
  The OSD uses the SecretKey it shares with the metadata server
  to compare the ReqMAC the client sent with a locally computed value:
   <figure>
    <artwork><![CDATA[
LocalCapKey = MAC<SecretKey>(Cap, SystemID)
LocalReqMAC = MAC<LocalCapKey>(Req, ReqNonce)
]]></artwork>
   </figure>
  and if they match the
  OSD assumes that the capabilities came from an authentic
  metadata server and allows access to the object, as allowed
  by the Cap.
  </t>
 </section>

 <section title="Protocol Privacy Requirements">
  <t>
  Note that if the server LAYOUTGET reply,
  holding CapKey and Cap, is snooped by another client, it can
  be used to generate valid OSD requests (within the Cap
  access restrictions).
  </t>
  <t>
  To provide the required privacy requirements for the capability key
  returned by LAYOUTGET, the <xref target='GSS-API'>GSS-API</xref>
  framework can be used, e.g. by using the RPCSEC_GSS
  privacy method to send the LAYOUTGET operation or by
  using the SSV key to encrypt the oc_capability_key using the GSS_Wrap() function.
  Two general ways to
  provide privacy in the absence of GSS-API that are independent
  of NFSv4 are either an isolated network such as a VLAN or a
  secure channel provided by <xref target='RFC4301'>IPsec</xref>.
  </t>
 </section>

 <section title="Revoking Capabilities" anchor="Revoking Capabilities">
  <t>
At any time, the metadata server may invalidate all outstanding
capabilities on an object by changing its POLICY ACCESS TAG attribute.
The value of the POLICY ACCESS TAG is part of a capability, and it
must match the state of the object attribute.  If they do not match,
the OSD rejects accesses to the object with the sense key set to
ILLEGAL REQUEST and an additional sense code set to INVALID FIELD IN CDB.
When a client attempts to use a capability and is rejected
this way, it should issue a LAYOUTCOMMIT
for the object and specify PNFS_OSD_BAD_CRED in the olr_ioerr_report
parameter. The client may elect to issue a compound
LAYOUTRETURN/LAYOUTGET (or LAYOUTCOMMIT/LAYOUTRETURN/LAYOUTGET)
to attempt to fetch a refreshed set of capabilities.
  </t>
  <t>
The metadata server may elect to change the access policy tag
on an object at any time, for any reason (with the understanding
that there is likely an associated performance penalty, especially
if there are outstanding layouts for this object). The metadata
server MUST revoke outstanding capabilities when any one of
the following occurs:
   <list style='symbols'>
    <t>The permissions on the object change,</t>
    <t>a conflicting mandatory byte-range lock is granted, or</t>
    <t>a layout is revoked and reassigned to another client.</t>
   </list>
  </t>
  <t>
A pNFS client will typically hold one layout for each byte range for either
READ or READ/WRITE. The client's credentials are checked by the metadata server
at LAYOUTGET time and it is the client's responsibility to enforce access
control among multiple users accessing the same file. It is neither required
nor expected that the pNFS client will obtain a separate layout for each user
accessing a shared object. The client SHOULD use OPEN and ACCESS calls to check user
permissions when performing I/O so that the server's access control policies
are correctly enforced. The result of the ACCESS operation may be cached
while the client holds a valid layout as the server is expected to recall
layouts when the file's access permissions or ACL change.
  </t>

 </section>
</section>

<section anchor="IANA Considerations" title="IANA Considerations">
 <t>
As described in the <xref target="NFSv4.1">NFSv4.1 draft</xref>,
new layout type numbers will be requested from IANA.
This document defines the protocol associated with the existing
layout type number, LAYOUT4_OSD2_OBJECTS, and it requires no further
actions for IANA.
 </t>
</section>

</middle>

<back>

<references title="Normative References">
  <reference anchor='osd standard' 
             target='http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf'>
    <front>
      <title>SCSI Object-Based Storage Device Commands</title>
      <author initials="R." surname="Weber" fullname="Ralph Weber">
        <organization abbrev="SNIA">SNIA T10/1355-D</organization>
      </author>
      <date month="July" year="2004"/>
    </front>
  </reference>

    <reference anchor='RFC2119'>
      <front>
      <title abbrev='RFC Key Words'>Key words for use in RFCs to Indicate Requirement Levels</title>
      <author initials='S.' surname='Bradner' fullname='Scott Bradner'>
      <organization>Harvard University</organization>
      <address>
      <postal>
      <street>1350 Mass. Ave.</street>
      <street>Cambridge</street>
      <street>MA 02138</street></postal>
      <phone>- +1 617 495 3864</phone>
      <email>sob@harvard.edu</email></address></author>
      <date year='1997' month='March' />
      </front>
      <seriesInfo name="RFC" value="2119"/>
      <format type="TXT" octets="4723" 
       target="http://www.ietf.org/rfc/rfc2119.txt"/>
    </reference>
 
    <reference anchor='XDR'>
      <front>
      <title abbrev='XDR'>XDR: External Data Representation Standard</title>
      <author initials='M.' surname='Eisler' fullname='Mike Eisler'>
      <organization>Network Appliance, Inc.</organization>
      </author>
      <date month='May' year='2006'/>
      </front>
      <seriesInfo name='STD' value='67' />
      <seriesInfo name="RFC" value="4506"/>
      <format type="TXT" octets="55477" 
       target="http://www.ietf.org/rfc/rfc4506.txt"/>
    </reference>
 
    <reference anchor='GSS-API'>

      <front>
	<title abbrev='GSS-API'>Generic Security Service Application
	Program Interface Version 2, Update 1</title>
	<author initials='J.' surname='Linn' fullname='John Linn'>
	  <organization>RSA Laboratories</organization>
	  <address>
	    <postal>
	      <street>20 Crosby Drive</street>
	      <city>Bedford</city>
	      <region>MA</region>
	      <code>01730</code>
	    <country>US</country></postal>
	    <phone>+1 781 687 7817</phone>
	<email>jlinn@rsasecurity.com</email></address></author>
	<date year='2000' month='January' />
	<abstract>
	  <t>The Generic Security Service Application Program
	  Interface (GSS-API), Version 2, as defined in [RFC-2078], provides
	  security services to callers in a generic fashion,
	  supportable with a range of underlying mechanisms and
	  technologies and hence allowing source-level portability of
	  applications to different environments. This specification
	  defines GSS-API services and primitives at a level
	  independent of underlying mechanism and programming language
	  environment, and is to be complemented by other, related
	  specifications:</t>
	  <t>documents defining specific parameter bindings for
	  particular language environments</t>
	  <t>documents defining token formats, protocols, and
	  procedures to be implemented in order to realize GSS-API
	  services atop particular security mechanisms</t> <t>This
	  memo obsoletes [RFC-2078], making specific, incremental changes in
	  response to implementation experience and liaison
	  requests. It is intended, therefore, that this memo or a
	  successor version thereto will become the basis for
	  subsequent progression of the GSS-API specification on the
	  standards track.</t></abstract></front>

      <seriesInfo name='RFC' value='2743' />
      <format type='TXT' octets='229418' target='ftp://ftp.isi.edu/in-notes/rfc2743.txt' />
    </reference>

  <reference anchor='iSCSI'
             target='http://www.ietf.org/rfc/rfc3720.txt'>
    <front>
        <title>Internet Small Computer Systems Interface (iSCSI)</title>
        <author fullname='J. Satran'>
          <organization abbrev='IBM'>IBM</organization>
        </author>
        <author fullname='K. Meth'>
          <organization abbrev='IBM'>IBM</organization>
        </author>
        <author fullname='C. Sapuntzakis'>
          <organization abbrev='Cisco'>Cisco Systems</organization>
        </author>
        <author fullname='M. Chadalapaka'>
          <organization abbrev='HP'>Hewlett-Packard Co.</organization>
        </author>
        <author fullname='E. Zeidner'>
          <organization abbrev='IBM'>IBM</organization>
        </author>
        <date month='April' year='2004' />
    </front>
    <seriesInfo name='RFC' value='3720' />
    <format type='TXT' octets='578468'
            target='http://www.ietf.org/rfc/rfc3720.txt' />
  </reference>

  <reference anchor='iscsi-naming-format'
             target='http://www.ietf.org/rfc/rfc3980.txt'>
    <front>
        <title>T11 Network Address Authority (NAA) Naming Format for iSCSI Node Names</title>
        <author fullname='M. Krueger'>
          <organization abbrev='HP'>Hewlett-Packard Co.</organization>
        </author>
        <author fullname='M. Chadalapaka'>
          <organization abbrev='HP'>Hewlett-Packard Co.</organization>
        </author>
        <author fullname='R. Elliott'>
          <organization abbrev='HP'>Hewlett-Packard Co.</organization>
        </author>
        <date month='February' year='2005' />
    </front>
    <seriesInfo name='RFC' value='3980' />
    <format type='TXT' octets='14056'
            target='http://www.ietf.org/rfc/rfc3980.txt' />
  </reference>

  <reference anchor='SPC-3'>
    <front>
      <title>SCSI Primary Commands - 3 (SPC-3)</title>
      <author initials="R." surname="Weber" fullname="Ralph O. Weber">
        <organization abbrev="SNIA">SNIA T10/1416-D</organization>
      </author>
      <date month='May' year='2005' />
    </front>
    <seriesInfo name='INCITS' value='408-2005' />
    <format type='PDF' octets='3044469'
            target='SCSI Primary Commands - 3 (SPC-3)' />
  </reference>

</references>
<references title="Informative References">

  <reference anchor='NFSv4.1' 
             target='http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-minorversion1-23.txt'>
    <front>
      <title>NFSv4 Minor Version 1</title>
      <author initials="S." surname="Shepler" fullname="Spencer Shepler">
        <organization abbrev="Sun">Sun Microsystems, Inc.</organization>
      </author>
      <author initials="M." surname="Eisler" fullname="Mike Eisler">
        <organization abbrev="Netapp">Network Appliance, Inc.</organization>
      </author>
      <author initials="D." surname="Noveck" fullname="David Noveck">
        <organization abbrev="Netapp">Network Appliance, Inc.</organization>
      </author>
      <date month="May" year="2008"/>
    </front>
  </reference>

  <reference anchor='NFS41_DOT_X' 
             target='http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-minorversion1-dot-x-06.txt'>
    <front>
      <title>NFSv4 Minor Version 1 XDR Description</title>
      <author initials="S." surname="Shepler" fullname="Spencer Shepler">
        <organization abbrev="Sun">Sun Microsystems, Inc.</organization>
      </author>
      <author initials="M." surname="Eisler" fullname="Mike Eisler">
        <organization abbrev="Netapp">Network Appliance, Inc.</organization>
      </author>
      <author initials="D." surname="Noveck" fullname="David Noveck">
        <organization abbrev="Netapp">Network Appliance, Inc.</organization>
      </author>
      <date month="May" year="2008"/>
    </front>
  </reference>

  <reference anchor='OSD2'
             target='http://www.t10.org/ftp/t10/drafts/osd2/osd2r03.pdf'>
    <front>
      <title>SCSI Object-Based Storage Device Commands -2 (OSD-2)</title>
      <author initials="R." surname="Weber" fullname="Ralph O. Weber">
        <organization abbrev="SNIA">SNIA T10/1729-D</organization>
      </author>
      <date month="January" year="2008"/>
    </front>
  </reference>

  <reference anchor='RFC4301'>
    <front>
      <title>Security Architecture for the Internet Protocol</title>
      <author initials='S.' surname='Kent' fullname='S. Kent'>
        <organization>BBN Technologies</organization>
      </author>
      <author initials='K.' surname='Seo' fullname='K. Seo'>
        <organization>BBN Technologies</organization>
      </author>
      <date year='2005' month='December'/>
    </front>
    <seriesInfo name='RFC' value='4301'/>
    <format type='TXT' octets='262123'
      target='http://www.ietf.org/rfc/rfc4301.txt'/>
  </reference>

  <reference anchor='Error Correcting Codes'>
	  <front>
		  <title>The Theory of Error-Correcting Codes, Part I</title>
		  <author initials='F. J.' surname='MacWilliams' fullname='F. J. MacWilliams'>
		    <organization> </organization>
		  </author>
		  <author initials='N. J. A.' surname='Sloane' fullname='N. J. A. Sloane'>
		    <organization> </organization>
		  </author>
		  <date year='1977'/>
	  </front>
  </reference>

  <reference anchor='iSNS'
             target='http://www.ietf.org/rfc/rfc4171.txt'>
    <front>
      <title>Internet Storage Name Service (iSNS)</title>
      <author initials='J.' surname='Tseng' fullname='J. Tseng'>
        <organization>Riverbed Technology</organization>
      </author>
      <author initials='K.' surname='Gibbons' fullname='K. Gibbons'>
        <organization>McDATA Corporation</organization>
      </author>
      <author initials='F.' surname='Travostino' fullname='F. Travostino'>
        <organization>Nortel</organization>
      </author>
      <author initials='C.' surname='Du Laney' fullname='C. Du Laney'>
        <organization>Rincon Research Corporation</organization>
      </author>
      <author initials='J.' surname='Souza' fullname='J. Souza'>
        <organization>Microsoft</organization>
      </author>
      <date year='2005' month='September'/>
    </front>
    <seriesInfo name='RFC' value='4171'/>
    <format type='TXT' octets='310636'
      target='http://www.ietf.org/rfc/rfc4171.txt'/>
  </reference>

  <reference anchor='EUI'
             target='http://standards.ieee.org/regauth/oui/tutorials/EUI64.html'>
    <front>
        <title>Guidelines for 64-bit Global Identifier (EUI-64) Registration Authority</title>
      <author>
        <organization>IEEE</organization>
      </author>
    </front>
    <format type='HTML' octets='17766'
            target='http://standards.ieee.org/regauth/oui/tutorials/EUI64.html' />
  </reference>

  <reference anchor='SRP'
             target='http://ftp.t10.org/ftp/t10/drafts/srp/srp-r16a.pdf'>
    <front>
        <title>SCSI RDMA Protocol (SRP)</title>
      <author>
        <organization>T10/ANSI INCITS 365-2002</organization>
      </author>
    </front>
    <seriesInfo name='INCITS' value='365-2002' />
    <format type='PDF' octets='831986'
            target='http://ftp.t10.org/ftp/t10/drafts/srp/srp-r16a.pdf' />
  </reference>

  <reference anchor='FC-FS-2'
             target='http://www.t11.org/t11/stat.nsf/upnum/1619-d'>
    <front>
        <title>Fibre Channel Framing and Signaling - 2 (FC-FS-2)</title>
      <author>
        <organization>T11 1619-D/ANSI INCITS 424-2007</organization>
      </author>
      <date month='August' year='2006'/>
    </front>
    <seriesInfo name='INCITS' value='424-2007' />
  </reference>

  <reference anchor='SAS'
             target='http://www.t10.org/ftp/t10/drafts/sas1/sas1r10.pdf'>
    <front>
        <title>Serial Attached SCSI - 1.1 (SAS-1.1)</title>
      <author>
        <organization>T10 1601-D/ANSI INCITS 417-2006</organization>
      </author>
      <date month='September' year='2005'/>
    </front>
    <seriesInfo name='INCITS' value='417-2006' />
    <format type='PDF' octets='5637775'
            target='http://www.t10.org/ftp/t10/drafts/sas1/sas1r10.pdf' />
  </reference>

</references>

<section title="Acknowledgments">
 <t>
    Todd Pisek was a co-editor of the initial drafts for this document.
    Daniel E. Messinger and Pete Wyckoff reviewed and commented on this document.
 </t>
</section>
</back>
</rfc>
PAFTECH AB 2003-2026
2026-04-18 22:56:17