[glue-wg] Some thoughts on storage objects

Paul Millar paul.millar at desy.de
Mon Mar 31 12:06:09 CDT 2008


Hi Maarten,

Thanks for the comments; my comments are interleaved below.

I'm updated the document, but do people feel this is useful?

We could:
	folded the information into the GLUE 2.0 spec,
	keep it as an informative (non-normative) document,
	drop it, as being too confusing?

On Monday 31 March 2008 01:48:16 Maarten.Litmaath at cern.ch wrote:
[...]
> > BTW, I'm implicitly assuming that StorageEnvironment.RetentionPolicy can
> > be multivalued.  If this isn't true and we have the use-case of the same
> > physical disks being part of, for example, both Custodial and Output
> > storage, then it starts to get complicated.
>
> I think it is OK if the RetentionPolicy _can_ be multivalued, but in WLCG
> it would be published single-valued, viz. along with an AccessLatency to
> describe the Storage Class that is implemented by the Environment.

OK, I've added a paragraph on that.

> > [...]
> > StorageEnvironment:
> >
> >  A StorageEnvironment is a collection of one or more StorageCapacities
> >  with a set of associated (enforced) storage management policies.
> >  Examples of these policies are Type (Volatile, Durable, Permanent)
> >  and RetentionPolicy (Custodial, Output, Replica).
>
> Note that we should get rid of the obsolete, confusing Type and Lifetime
> attributes in favor of the ExpirationMode copied from SRM v3.

Should we do this with GLUE v2.0?

(I would be happy with that).


> > [...]
> >  A StorageResource is an aggregation of one or more
> >  StorageEnvironments and describes the hardware that a particular
> >  software instance has under its control.
>
> See my reply to Sergio: we may rather want to allow an Environment
> to be linked to multiple Resources, e.g. a disk and a tape Resource,
> such that we can publish the back-end implementation name and version
> for each of them.

Yes, I agree the bit about Environment being hosted in a single Resource is 
wrong; for example, dCache (a StorageResource) and TSM (a StorageResource) 
together host "D1T1"  (a StorageEnvironment), which has a disk-based 
StorageCapacity and a tape-base StorageCapacity.

I'm try to reword that bit.

[...]
> > StorageShare:
[...]
> >  StorageSpaces must have one or more associated StorageCapacities.
> |         ^^^^^^
> |         Shares

Yes, I'm not sure what happened there: too many words beginning with "S", I 
guess.

[...]
> >  (SC_E) that is associated with some StorageEnvironment and which has
> >  totalSize TS_E, let the sum of the totalSize attributes for all
> >
> |                  ^^^
> |                  let TS_S be .....

Err,  I think that one should be TS_E.  totalSize of the StorageShare 
associated with the Environment (the one "underneath" the 
StorageEnvironment).  In this context the StorageShare represents all of the 
physical medium.

Somehow, accurately describing what overlapping and incomplete StorageShares 
means takes a lot of words!


[...]
> >  do not change as a result of file creation or deletion.  [Does GLUE need
> > to stipulate this, or should we leave this vague?]
>
> Why mention it at all?  You do not make statements about the behavior
> of the other sizes, and I think there is no need to go there...

Well, we could just not mention it; however, I'm a little concerned about 
tacit assumptions, and how not everyone has the same set.  I'd hope we can 
make all the assumptions explicit.

In this particular case, there's (at least) a couple of models for how a space 
could work:

  a. partitioning: I'm allocating 10 TiB, I can store files up to that size of 
data (*).  Once I've written 10 TiB of data, I can delete files to create 
more space.  This is like having a 10 TiB hard disk to store data.

	(*) real-life systems have some complications, but considering an
		idealised storage system.

  b. consumable: I'm allocated 10 TiB storage.  I can store files up to that 
size of data.  I can delete the files if I like, but deleting files doesn't 
allow me to store more files.  Once I've used up that 10 TiB of storage, I 
have to ask for more.

(perhaps option b. seems a little crazy, but it might be how StorageShares 
would work with archival WORM media).

It seems that everyone in HEP assumes a partitioning model but I don't think 
I've seen it stated anywhere.  Other communities might assume a different 
model.  If we want information to be sharable (or mergable) I think we should 
state clearly any assumptions we're making.  If we don't assume anything, 
that should be noted too, so people consuming information know that either:
	1. they can't assume what happens, or
	2. if they assume something, that it's a WLCG convention that they must 
revisit when combining data with other information sources.


[...]
> >  On observing a StorageAccessProcol, one may deduce only that 
> >  it is valid for at least one user of one supported UserDomain.
>
> ..... from at least one computer.

True :)  I've added that, too.

Cheers,

Paul.


More information about the glue-wg mailing list