[glue-wg] Some thoughts on storage objects
Paul Millar
paul.millar at desy.de
Mon Mar 31 12:06:09 CDT 2008
Hi Maarten,
Thanks for the comments; my comments are interleaved below.
I'm updated the document, but do people feel this is useful?
We could:
folded the information into the GLUE 2.0 spec,
keep it as an informative (non-normative) document,
drop it, as being too confusing?
On Monday 31 March 2008 01:48:16 Maarten.Litmaath at cern.ch wrote:
[...]
> > BTW, I'm implicitly assuming that StorageEnvironment.RetentionPolicy can
> > be multivalued. If this isn't true and we have the use-case of the same
> > physical disks being part of, for example, both Custodial and Output
> > storage, then it starts to get complicated.
>
> I think it is OK if the RetentionPolicy _can_ be multivalued, but in WLCG
> it would be published single-valued, viz. along with an AccessLatency to
> describe the Storage Class that is implemented by the Environment.
OK, I've added a paragraph on that.
> > [...]
> > StorageEnvironment:
> >
> > A StorageEnvironment is a collection of one or more StorageCapacities
> > with a set of associated (enforced) storage management policies.
> > Examples of these policies are Type (Volatile, Durable, Permanent)
> > and RetentionPolicy (Custodial, Output, Replica).
>
> Note that we should get rid of the obsolete, confusing Type and Lifetime
> attributes in favor of the ExpirationMode copied from SRM v3.
Should we do this with GLUE v2.0?
(I would be happy with that).
> > [...]
> > A StorageResource is an aggregation of one or more
> > StorageEnvironments and describes the hardware that a particular
> > software instance has under its control.
>
> See my reply to Sergio: we may rather want to allow an Environment
> to be linked to multiple Resources, e.g. a disk and a tape Resource,
> such that we can publish the back-end implementation name and version
> for each of them.
Yes, I agree the bit about Environment being hosted in a single Resource is
wrong; for example, dCache (a StorageResource) and TSM (a StorageResource)
together host "D1T1" (a StorageEnvironment), which has a disk-based
StorageCapacity and a tape-base StorageCapacity.
I'm try to reword that bit.
[...]
> > StorageShare:
[...]
> > StorageSpaces must have one or more associated StorageCapacities.
> | ^^^^^^
> | Shares
Yes, I'm not sure what happened there: too many words beginning with "S", I
guess.
[...]
> > (SC_E) that is associated with some StorageEnvironment and which has
> > totalSize TS_E, let the sum of the totalSize attributes for all
> >
> | ^^^
> | let TS_S be .....
Err, I think that one should be TS_E. totalSize of the StorageShare
associated with the Environment (the one "underneath" the
StorageEnvironment). In this context the StorageShare represents all of the
physical medium.
Somehow, accurately describing what overlapping and incomplete StorageShares
means takes a lot of words!
[...]
> > do not change as a result of file creation or deletion. [Does GLUE need
> > to stipulate this, or should we leave this vague?]
>
> Why mention it at all? You do not make statements about the behavior
> of the other sizes, and I think there is no need to go there...
Well, we could just not mention it; however, I'm a little concerned about
tacit assumptions, and how not everyone has the same set. I'd hope we can
make all the assumptions explicit.
In this particular case, there's (at least) a couple of models for how a space
could work:
a. partitioning: I'm allocating 10 TiB, I can store files up to that size of
data (*). Once I've written 10 TiB of data, I can delete files to create
more space. This is like having a 10 TiB hard disk to store data.
(*) real-life systems have some complications, but considering an
idealised storage system.
b. consumable: I'm allocated 10 TiB storage. I can store files up to that
size of data. I can delete the files if I like, but deleting files doesn't
allow me to store more files. Once I've used up that 10 TiB of storage, I
have to ask for more.
(perhaps option b. seems a little crazy, but it might be how StorageShares
would work with archival WORM media).
It seems that everyone in HEP assumes a partitioning model but I don't think
I've seen it stated anywhere. Other communities might assume a different
model. If we want information to be sharable (or mergable) I think we should
state clearly any assumptions we're making. If we don't assume anything,
that should be noted too, so people consuming information know that either:
1. they can't assume what happens, or
2. if they assume something, that it's a WLCG convention that they must
revisit when combining data with other information sources.
[...]
> > On observing a StorageAccessProcol, one may deduce only that
> > it is valid for at least one user of one supported UserDomain.
>
> ..... from at least one computer.
True :) I've added that, too.
Cheers,
Paul.
More information about the glue-wg
mailing list