[glue-wg] Datastore proposal

Burke, S (Stephen) S.Burke at rl.ac.uk
Wed Apr 9 09:14:36 CDT 2008


Hi,

I'll do this while it's fresh in my mind ... for those not in the
meeting, the background is that we have a long-running debate about
whether to represent the storage hardware explicitly in the schema. In
the original Glue we had the Storage Library, which we then obsoleted
because no-one used it. In the 1.3 discussion there was the proposed
Storage Component, which we left out because we didn't get agreement on
whether we needed it. Now we have the same discussion coming round again
- the current draft has a Storage Resource to describe the software
which manages some storage, e.g. Enstore or GPFS, but still nothing to
represent the hardware it manages. The problem is that we never seem to
have a clear use case that requires a hardware description, but it keeps
coming back in discussions, perhaps because it's a natural way to think
about storage systems. (Also LCG has specific hardware restrictions,
e.g. that Custodial must imply tape storage, which are not mandated by
SRM or described in the schema.)

  My proposal is to shortcut this discussion by putting a simple
representation of the hardware in the schema, which would be optional
for anyone to publish, so the debate can at least be pushed off to
implementation time. I propose an object which I tentatively call a
Datastore (or we could go back to the old StorageLibrary name if we want
everything to start with "Storage"). A Datastore would represent some
set of uniform managed storage hardware, e.g. a tape robot plus all the
tapes, or a set of disk servers managed for the same purpose. For
clarity, disk servers allocated e.g. to different VOs would still just
constitute a single Datastore, but disk servers used for completely
different purposes would be separate, e.g. the disk cache in front of
the tape robot would be a different Datastore if managed independently
of the Disk1 storage.

  The Datastore would have fairly few attributes:

UniqueID (as usual)

Name (human-readable name, maybe indicating the technology, e.g.
StorageTek)
 
Type (disk, tape, ... - open enumeration)

Capacity (NB this is in the schema as a separate object for technical
reasons but is really just an attribute)

OtherInfo (as usual)

It might perhaps be useful to give the technology, e.g. RAIDn for disk
systems, but I think that should go in OtherInfo as it's likely to be
hard to standardise it.

  This would be linked to the existing StorageResource object with a
one-to-many relation, i.e. one Resource could manage many Datastores
(Castor manages tape and disk) but not vice versa, one Datastore can
only be managed by one Resource - if there are e.g. multiple sets of
disk servers managed by several different software systems that would
constitute multiple Datastores.

  The relation to StorageEnvironment and/or StorageShare remains open
for discussion as there are other issues there (e.g. whether we want the
Environment at all), but conceptually you would want a relation between
the Share and whichever DataStore(s) store the data for that Share. That
can be one to many, e.g. if Custodial/Online uses disk+tape, as in WLCG.

  Comments?

Stephen


More information about the glue-wg mailing list