[glue-wg] Some thoughts on storage objects
Paul Millar
paul.millar at desy.de
Thu Mar 27 16:06:44 CDT 2008
Hi all,
As an exercise, I've tried to jot down as precise and complete a description
of each GLUE storage object as possible, also describing and how they relate
to each other. I've also tried to do this without any forward-references
(so, in theory, the document is readable in a single pass). In almost all
cases, I've left out the attributes.
I don't know how useful this is. It's just my point-of-view of things as
stand now. I'm sure there are bits that are "wrong" (either I've
misunderstood and/or this description breaks a use-case), but if so,
helpfully people can point which bits are wrong and (perhaps) it will
stimulate some discussion.
BTW, I'm implicitly assuming that StorageEnvironment.RetentionPolicy can be
multivalued. If this isn't true and we have the use-case of the same
physical disks being part of, for example, both Custodial and Output storage,
then it starts to get complicated.
As always, comments appreciated.
Cheers,
Paul.
---
UserDomain:
A collection of one or more end-users; a VO is an instance of a
UserDomain. All end-users that interact with the physical storage
are a member of a UserDomain and, in general, derive their
authorisation from that membership.
StorageCapacity:
A StorageCapacity object describes the ability to store data within a
homogeneous storage technology. This storage technology provides a
common access latency.
All StorageCapacity objects are specified within a certain context.
The context is determined by an association between the
StorageCapacity object and precisely one other higher-level object.
These associations are not listed here, but are described in later sections.
In general, a StorageCapacity object will record some
context-specific information. Examples of such information include
the total storage capacity of the underlying technology and how much
of that total has been used.
The underlying storage technology may affect which of the
context-specific attributes are available. For example, tape storage
may be considered semi-infinite, so the total and free attributes have
no meaning. If this is so, then it affects all StorageCapacity objects with
the same underlying technology, independent of their context.
Different contexts may also affect what context-specific attributes
are recorded. This is a policy decision when implementing GLUE, as
recording all possible information may be costly and provide no great
benefit.
[Aside: these two reasons are why many of the attributes within
StorageCapacity are optional. Rather than explicitly subclassing
the objects and making the values required, it is left deliberately
vague which attributes are published.]
A StorageCapacity may represent a logical aggregation of multiple
underlying storage technology instances; for example, a
StorageCapacity might represent many disk storage nodes, or many
tapes stored within a tape silo. GLUE makes no effort to record
information at this deeper level; but by not doing so, it requires
that the underlying storage technology be homogeneous. Homogeneous
means that the underlying storage technology is either identical or
sufficiently similar that the differences don't matter.
In most cases, the homogeneity is fairly obvious (e.g., tape storage
vs disk-based storage), but there may be times where this distinction
becomes contentious and judgement may be required; for example,
the quality of disk-base storage might indicate that one subset is
useful for a higher-quality service. If this is so, then it may make
sense to represent the different class of disk by different
SpaceCapacities.
StorageEnvironment:
A StorageEnvironment is a collection of one or more StorageCapacities
with a set of associated (enforced) storage management policies.
Examples of these policies are Type (Volatile, Durable, Permanent)
and RetentionPolicy (Custodial, Output, Replica).
StorageEnvironments act as a logical aggregation of
StorageCapacities, so each StorageEnvironment must have at least one
associated StorageCapacity. It is the associated StorageCapacities
that allow a StorageEnvironment to store data with its advertised
policies; for example, to act as (Permanent, Custodial) storage of
data.
Since a StorageEnvironment may contain multiple StorageCapacities, it
may describe a heterogeneous environment. An example of this is "tape
storage", which has both tape back-end and disk front-end into
which users can pin files. Such a StorageEnvironment would have two
associated StorageCapacities: one describing the disk storage and
another describing the tape.
If a StorageCapacity is associated with a StorageEnvironment, it is
associated with only one. A StorageCapacity may not be shared
between different StorageEnvironments.
StorageCapacities associated with a StorageEnvironment must be
non-overlapping with any other such StorageCapacity and the set of all
such StorageCapacities must represent the complete storage available
to end-users. Each physical storage device (e.g., individual disk drive or
tape) that an end-user can utilise must be represented by (some part
of) precisely one StorageCapacity associated with a StorageEnvironment.
Nevertheless, the StorageCapacities associated with
StorageEnvironments may be incomplete as a site may deploy physical
storage devices that are not directly under end-user control; for
example, disk storage used to cache incoming transfers. GLUE makes
no effort to record information about such storage.
StorageResource:
A StorageResource is an aggregation of one or more
StorageEnvironments and describes the hardware that a particular
software instance has under its control.
A StorageResource must have at least one StorageEnvironment,
otherwise there wouldn't be much point publishing information
about it. [This isn't a strict requirement, but I think it makes sense
to include it.]
All StorageEnvironments must be part of precisely one
StorageResource. SoftwareEnvironments may not be shared between
StorageResources. This means that all physics hardware must
be published under precisely one StorageResource.
StorageShare:
A StorageShare is a logical partitioning of one or more
StorageEnvironments.
Perhaps the simplest example of a StorageShare is one
associated with a single StorageEnvironment with a single
associated StorageCapacity, and that represents all
the available storage of that StorageCapacity. An example of
a storage that could be represented by this trivial
StorageShare is the classic-SE.
StorageSpaces must have one or more associated StorageCapacities.
These StorageCapacities provide a complete description of the different
homogeneous underlying technologies that are available under the space.
In general, the number of StorageCapacities associated with a
StorageShare is the sum of the number of StorageCapacities associated
with each of the StorageShare's associated StorageEnvironments.
Following from this, there is an implicit association between the
StorageCapacity associated with a StorageShare and the corresponding
StorageCapacity associated with a StorageEnvironment. Intuitively, this
association is from the fact that the two StorageCapacities share the
same underlying physical storage. This implicit association is not
recorded in GLUE.
StorageSpaces may overlap. Specifically, given a StorageCapacity
(SC_E) that is associated with some StorageEnvironment and which has
totalSize TS_E, let the sum of the totalSize attributes for all
StorageCapacities that are:
1. associated with a StorageSpace, and
2. that are implicitly associated with SC_E
be TS_S. If the StorageSpaces are covering then TS_S = TS_E. If
the StorageSpaces overlap, then TS_S > TS_E.
[sorry, I couldn't easily describe this with just words without it sounding
awful!]
StorageSpaces may be incomplete. Following the same definitions
as above, this is when TS_S < TS_E. Intuitively, this happens if
the site-admin has not yet assigned all available storage.
End-users within a UserDomain may wish to store or retrieve files. The
StorageShares provides a complete, abstract description of the
underlying storage at their disposal. No member of a UserDomain may
interact with the physical hardware except through a StorageShare.
The partitioning is persistent through file creation and deletion. The
totalSize attributes (of a StorageSpace's associated StorageCapacties)
do not change as a result of file creation or deletion. [Does GLUE need to
stipulate this, or should we leave this vague?]
A single StorageShare may allow multiple UserDomains to access
storage; if so, the StorageShare is "shared" between the different
UserDomains. Such a shared StorageShare is typical if a site
provides storage described by the trivial StorageShare (one that
covers a complete StorageEnvironment) whilst supporting multiple
UserDomains.
StorageMappingPolicy:
The StorageMappingPolicy describes how a particular UserDomain is
allowed to access a particular StorageShare. No member of a
UserDomain may interact with a StorageShare except as described by a
StorageMappingPolicy.
The StorageMappingPolicies may contain information that is specific
to that UserDomain, such as one or more associated StorageCapacities.
If provided, these provide a UserDomain-specific view of their usage
of the underlying physical storage technology as a result of their
usage within the StorageShare.
If StorageCapacities are associated with a StorageMappingPolicy,
there will be the same number as are associated with the
corresponding StorageShare.
StorageEndpoint:
A StorageEndpoint specifies that storage may be controlled through a
particular interface. The SRM protocol is an example of such an
interface and a StorageEndpoint would be advertised for each instance
of SRM.
The access policies describing which users of a UserDomain may use
the StorageEndpoint are not published. On observing that a site
publishes a StorageEndpoint, one may deduce only that it is valid for
at least one user of one supported UserDomain.
StorageAccessProtocol:
A StorageAccessProtocol describes how data may be sent or received.
The presence of a StorageAccessProtocol indicates that data may be
fetched or stored using this interface.
Access to the interface may be localised; that is, only available
from certain computers. It may also be restricted to specified
UserDomains. However, neither policy restrictions are published in
GLUE. On observing a StorageAccessProcol, one may deduce only that
it is valid for at least one user of one supported UserDomain.
StorageService:
A StorageService is an aggregation of StorageEndpoints,
StorageAccessProtocols and StorageResources. It is the top-level
description of the ability to transfer files to and from a site, and
manipulate the files once stored.
More information about the glue-wg
mailing list