[glue-wg] Updated thoughts...

Paul Millar paul.millar at desy.de
Tue Apr 8 05:27:31 CDT 2008


Hi all,

I've tried updated the storage objects "thoughts" document with feedback from 
people.

(Jen, my apologies, I now see what you meant about covering sets: we share the 
same understanding but what I had written was just wrong .. I've tried to get 
it more right this time).

Sorry if I've missed out someone's a comment, please yell.

Also, I've tried to separate the more informative paragraphs by marking them 
differently.

One way of using this information would be to fold the more normative parts 
into the GLUE spec proper.  The informative paragraphs could then form an 
informative annex to the document; c.f.

	http://standards.ieee.org/guides/companion/part1.html#annexes

Cheers,

Paul.

---


Some thoughts on GLUE Storage objects  (v0.2)
---

These are some notes on the different objects in the forthcoming GLUE
v2.0 information model, concentrating on those responsible for
storage.  These notes provide a mixture of normative and informative
information.  The non-normative (informative) paragraphs are red-lined
(marked with a "|" on the left-most edge).

This should be read with the GLUE schema and, if there are any
discrepancies, the GLUE schema document is authoritative.

Please email corrections to <paul.millar at desy.de> or the GLUE working
group <glue-wg at ogf.org>.



UserDomain:

  A collection of one or more end-users.  All end-users that interact
  with the physical storage are a member of a UserDomain.

[much more can (and perhaps should) be said here]

| A Virtual Organisation (VO) is an instance of a UserDomain.  A group
| within a VO may be represented as a UserDomain.  In general, users
| derive their authorisation from their membership.



StorageCapacity:

  A StorageCapacity object describes the ability to store data within
  a homogeneous storage technology.  Each object provides a view of
  that physical storage medium with a common access latency.

  All StorageCapacity objects are specified within a certain context.
  The context is determined by an association between the
  StorageCapacity object and precisely one other higher-level object.

| These associations are not listed here, but are described in later
| sections.
|
| In general, a StorageCapacity object will record some
| context-specific information.  Examples of such information include
| the total storage capacity of the underlying technology and how much
| of that total has been used.
|
| The underlying storage technology may affect which of the
| context-specific attributes are available.  For example, tape storage
| may be considered semi-infinite, so the total and free attributes have
| no meaning.  If this is so, then it affects all StorageCapacity objects with
| the same underlying technology, independent of their context.
|
| Different contexts may also affect what context-specific attributes
| are recorded.  This is a policy decision when implementing GLUE, as
| recording all possible information may be costly and provide no great
| benefit.
|
| [Aside: these two reasons are why many of the attributes within
| StorageCapacity are optional.  Rather than explicitly subclassing
| the objects and making the values required, it is left deliberately
| vague which attributes are published.]
|
| A StorageCapacity may represent a logical aggregation of multiple
| underlying storage technology instances; for example, a
| StorageCapacity might represent many disk storage nodes, or many
| tapes stored within a tape silo.  GLUE makes no effort to record
| information at this deeper level; but by not doing so, it requires
| that the underlying storage technology be homogeneous. Homogeneous
| means that the underlying storage technology is either identical or
| sufficiently similar that the differences don't matter.
|
| In most cases, the homogeneity is fairly obvious (e.g., tape storage
| vs disk-based storage), but there may be times where this distinction
| becomes contentious and judgement may be required; for example,
| the quality of disk-base storage might indicate that one subset is
| useful for a higher-quality service.  If this is so, then it may make
| sense to represent the different class of disk by different
| SpaceCapacities.


StorageEnvironment:

  A StorageEnvironment is a collection of one or more
  StorageCapacities with a set of associated (enforced) storage
  management policies.

| Examples of these policies are Type (Volatile, Durable, Permanent)
| and RetentionPolicy (Custodial, Output, Replica).
|
| In general, a StorageEnvironment may have one or more
| RetentionPolicy values.  If it has more than one, all data stored
| using this StorageEnvironment will adopt precisely one of the values
| present.  The policy describing how data is assigned a
| RetentionPolicy value is not specified; for example, GLUE does not
| record a default RetentionPolicy.  GLUE also does not specify
| whether it is possible to migrate data from one RetentionPolicy
| value to a different value whilst using the same
| StorageEnvirionment.

  StorageEnvironments act as a logical aggregation of
  StorageCapacities, so each StorageEnvironment must have at least one
  associated StorageCapacity.

| It is the associated StorageCapacities that allow a
| StorageEnvironment to store data with its advertised policies; for
| example, to act as (Permanent, Custodial) storage of data.
|
| Since a StorageEnvironment may contain multiple StorageCapacities,
| it may describe a heterogeneous environment.  An example of this is
| "tape storage", which has both tape back-end and disk front-end into
| which users can pin files.  Such a StorageEnvironment would have two
| associated StorageCapacities: one describing the disk storage and
| another describing the tape.

  If a StorageCapacity is associated with a StorageEnvironment, it is
  associated with only one.  A StorageCapacity may not be shared
  between different StorageEnvironments.

| StorageCapacities associated with a StorageEnvironment must be
| non-overlapping with any other such StorageCapacity and the set of
| all such StorageCapacities must represent the complete storage
| available to end-users.  Each physical storage device (e.g.,
| individual disk drive or tape) that an end-user can utilise must be
| represented by (some part of) precisely one StorageCapacity
| associated with a StorageEnvironment.
|
| Nevertheless, the StorageCapacities associated with
| StorageEnvironments may be incomplete as a site may deploy physical
| storage devices that are not directly under end-user control; for
| example, disk storage used to cache incoming transfers.  GLUE makes
| no effort to record information about such storage.


StorageResource:

  A StorageResource is a logically separable component that provides
  the management functionality described in one or more
  StorageEnvironments. More than one StorageResource may cooperate to
  provide an advertised StorageEnvironment.

| Typically, a StorageResource will describe a running instance of
| some software, but it may describe some commodity hardware.  A
| StorageResource should, under normal circumstances, have at least
| one StorageEnvironment, otherwise there wouldn't be much point
| publishing information about it.
|
| All StorageEnvironments must be part of at least one StorageResource
| as the StorageResource is how the management attributes described in
| the StorageResource are provided.
|
| GLUE makes no attempt to record which physical storage (as
| represented by StorageCapacity objects) is under control of which
| StorageResource.


StorageShare:

  A StorageShare is a logical partitioning of one or more
  StorageEnvironments.

| Perhaps the simplest example of a StorageShare is one associated
| with a single StorageEnvironment with a single associated
| StorageCapacity, and that represents all the available storage of
| that StorageCapacity.  An example of a storage that could be
| represented by this trivial StorageShare is the classic-SE.

  StorageShare must have one or more associated StorageCapacities.
  These StorageCapacities provide a view of the different homogeneous
  underlying technologies that are available under the space.

| The StorageCapacities within the StorageShare context need not
| describe all storage: the number of StorageCapacities associated
| with a StorageShare may be less than the sum of the number of
| StorageCapacities associated with each of the StorageShare's
| associated StorageEnvironments.
|
| There is an *implicit* association between the StorageCapacity
| associated with a StorageShare and the corresponding StorageCapacity
| associated with a StorageEnvironment.  Intuitively, this association
| is from the fact that the two StorageCapacities are views of the
| same underlying physical storage.  This implicit association is not
| recorded in GLUE.
|
| Any pair of StorageShares may be (pair-wise) overlapping; that is,
| they have at least one pair of implicitly associated
| StorageCapacities (both within a different StorageShare context)
| that provide a view onto the a common portion of underlying storage.
| A pair of shared StorageCapacities indicate common access to a
| shared underlying storage and, in general, storage operations within
| one shared StorageShare will affect the other.
|
| A pair of StorageShares may be partially shared, that is, they have
| at least one pair StorageCapacities that are shared and at least one
| that is not.  Partially shared StorageCapacities could represent two
| UserDomain's access to a tape store, where they share a common set
| of disk pools but the tape storage is distinct.
|
| StorageShares may covering or not.  A covering set of StorageShares
| will make all of the underlying storage available to the end-users.
| When a set of StorageShares are not covering, the site-admin has
| refrained from allocating some of the underlying storage.

| In general, given a StorageCapacity (SC_E) that is associated with
| some StorageEnvironment and which has totalSize TS_E, let TS_S be
| the sum of the totalSize attributes for all StorageCapacities that
| are:
|
|	1. associated with a StorageShare, and
|	2. that are implicitly associated with SC_E.
|
| If the StorageShares are covering and non-overlapping then TS_S =
| TS_E.  If they are covering and overlapping the TS_S > TS_E. If they
| are not covering and non-overlapping, then TS_S < TS_E.  If they are
| not covering and overlapping no general statement can be made about
| the totalSize.
|
| End-users within a UserDomain may wish to store or retrieve files.
| The StorageShares provides a complete, abstract description of the
| underlying storage at their disposal.  No member of a UserDomain may
| interact with the physical hardware except through a StorageShare.
|
| In general, one should not draw strong conclusions about how these
| attributes will alter under storage operations.  The behaviour of
| these attribtues may be governed by site level policies, software
| implementation, underlying storage technology, concurrent usage,
| etc; for example, deleting files may or may not result in freeSpace
| increasing.
|
| A grid may impose more strict requirements on the attributes
| behavior.  This may be due to good knowledge of deployed software,
| management policies and underlying technologies in used.  Care
| should be taken when sharing monitoring information that the storage
| models are compatible.
|
| A single StorageShare may allow multiple UserDomains to access
| storage; if so, the StorageShare is "shared" between the different
| UserDomains.  Such a shared StorageShare is typical if a site
| provides storage described by the trivial StorageShare (one that
| covers a complete StorageEnvironment) whilst supporting multiple
| UserDomains.


StorageMappingPolicy:

  The StorageMappingPolicy describes how a particular UserDomain is
  allowed to access a particular StorageShare.

| No member of a UserDomain may interact with a StorageShare except as
| described by a StorageMappingPolicy.
|
| The StorageMappingPolicies may contain information that is specific
| to that UserDomain, such as one or more associated
| StorageCapacities.  If provided, these provide a UserDomain-specific
| view of their usage of the underlying physical storage technology as
| a result of their usage within the StorageShare.
|
| If StorageCapacities are available with a StorageMappingPolicy
| context, there may be the same number as are associated with the
| corresponding StorageShare or less.


StorageEndpoint:

  A StorageEndpoint specifies that storage may be controlled through a
  particular interface.

| In general, such interfaces do not allow data transfer.  The SRM
| protocol is an example of such an interface and a StorageEndpoint
| would be advertised for each instance of SRM.
|
| The access policies describing which users of a UserDomain may use
| the StorageEndpoint are not published.  On observing that a site
| publishes a StorageEndpoint, one may deduce only that it is valid
| for at least one user of one supported UserDomain.


StorageAccessProtocol:

  A StorageAccessProtocol describes one method by which end-users may
  sent data to be stored, received stored data, or undertake both
  operations.
 
| Access to the interface may be localised; that is, only available
| from certain computers.  It may also be restricted to specified
| UserDomains.  However, neither policy restrictions are published in
| GLUE.  On observing a StorageAccessProcol, one may deduce only that
| it is valid for at least one user of one supported UserDomain from
| at least one computer.


StorageService:

  A StorageService is an aggregation of StorageEndpoints,
  StorageAccessProtocols and StorageResources.

| It is the top-level description of the ability to transfer files to
| and from a site, and manipulate the files once stored.



More information about the glue-wg mailing list