[glue-wg] Updated thoughts...

Tue Apr 8 09:45:48 CDT 2008

glue-wg-bounces at ogf.org 
> [mailto:glue-wg-bounces at ogf.org] On Behalf Of Paul Millar said:
> Sorry if I've missed out someone's a comment, please yell.

I haven't made comments yet, but this seems like a good place to start
... sorry there are quite a lot but I think it's worth trying to nail
things down as much as possible.

> UserDomain:
> 
>   A collection of one or more end-users.  All end-users that interact
>   with the physical storage are a member of a UserDomain.

Perhaps opening a can of worms, but it may also be possible for a
UserDomain to include services, i.e. you might have services registered
in VOMS as well as users (even with delegated credentials you may want
to give privileges to services which the users don't have).

> StorageCapacity:
> 
>   A StorageCapacity object describes the ability to store data within
>   a homogeneous storage technology.  Each object provides a view of
>   that physical storage medium with a common access latency.

It isn't necessarily just the latency that matters, for example it may
be useful to publish the Capacity of the disk cache in front of a tape
system (see further comments below) - the latency is Online but the
functionality is different from Disk1 Online storage. (Similarly a Disk1
storage system might make extra cache copies to help with load
balancing.) I think the phraseology should be something like "a common
category of storage" (although maybe "category" still isn't the right
word).

  I'd also like to go back to the question I posed in one of the
meetings ... say that a site implements Custodial/Online by ensuring
three distinct disk copies, how would we represent that? What about
mirrored RAID, how much space do we record?

  Another thing is that I think there is some mission creep going on in
the Capacity concept. When I suggested introducing it it was really as a
complex data type, i.e. as an alternative to putting maybe 20 separate
attributes into each object that can have a size you would effectively
have one multivalued "attribute" with type "Capacity" rather than int.
However, your descriptions suggest that you're thinking more in terms of
a Capacity representing a real thing (a bunch of storage units) which
indeed have sizes but may have other attributes too. That isn't
necessarily a bad thing, but we should probably be clear in our minds
about what we intend.

>   The context is determined by an association between the
>   StorageCapacity object and precisely one other higher-level object.

What was the decision about Shares for different VOs which share the
same physical space? (I haven't really read all the mails yet so this
may already be answered ... actually there is more on this further
down.)

> | The underlying storage technology may affect which of the
> | context-specific attributes are available.  For example, 
> tape storage
> | may be considered semi-infinite, so the total and free 
> attributes have
> | no meaning.  If this is so, then it affects all 
> StorageCapacity objects with
> | the same underlying technology, independent of their context.

I'm not quite sure what you're saying here. It seems to me that the
schema itself should not be defining this - I would still maintain that
tape systems do in fact have a finite capacity at any given time so it
isn't conceptually absurd (and "nearline" may not necessarily mean
"tape" anyway"). Individual Grids may wish to make their own decisions
about what to publish, and equally it seems possible that, say, dcache
may decide not to publish something but Castor may. All the schema
should do is say that the attributes are optional, but *if* they are
published the meaning should be well-defined and common across all
Grids/implementations/... (and maybe we also want a special value to
mean quasi-infinite?)

> | that the underlying storage technology be homogeneous. Homogeneous
> | means that the underlying storage technology is either identical or
> | sufficiently similar that the differences don't matter.

I think the real point is more that it's treated uniformly by the SRM
(or other storage manager) - even if the differences do matter there
won't be anything you can do about it if the SRM gives you no control
over it! (e.g. to put your file on RAID 6 rather than RAID 0.)

>   A StorageEnvironment is a collection of one or more
>   StorageCapacities with a set of associated (enforced) storage
>   management policies.

Hmm ... I could suggest that the Environment now also looks more like a
data type than a real object (and is also rather SRM2-specific as it
stands). And why are the attributes optional, i.e. what would it mean if
one or both is missing? Should there be an OtherInfo attribute? What
would we do for classic SEs, or SRB, or for that matter SRM 1?

  [What actually seems to have happened here is that things have
gradually turned inside out. We started with the SA as the main
representation of a set of hardware, with size, policy and ACL
information embedded in it and subsequently with the VOInfo added as a
dependent object. Now the size (Capacity), ACL (MappingPolicy) and
VOInfo (Share) are getting carved out as separate objects with an
independent "life" and most of the policy attributes have been
obsoleted, so we're left with something that carries almost no
information and a role which, to me at least, is not totally clear. I'm
not saying there's anything wrong with this, but it may lead to
misconceptions derived from trying to relate the Glue 2 objects to their
Glue 1 equivalents.]

> | Examples of these policies are Type (Volatile, Durable, Permanent)
> | and RetentionPolicy (Custodial, Output, Replica).

Except that Type (or ExpirationMode) doesn't seem to be an attribute in
the current draft ... what about other policies, e.g. the old schema had
MinFileSize - if we ever wanted to implement such a thing would it go
here? Conversely Latency isn't a policy, it's a feature of the hardware.
If we really want a Policy object should we call it that rather than
Environment?

> | In general, a StorageEnvironment may have one or more
> | RetentionPolicy values.

Not what it says in the current draft (0..1). Does this correspond with
SRM usage, i.e. can you have spaces with multiple RPs?

> | GLUE does not
> | record a default RetentionPolicy. 

Should it? What about defaults for other things, e.g. ExpirationMode?

> | It is the associated StorageCapacities that allow a
> | StorageEnvironment to store data with its advertised policies; for
> | example, to act as (Permanent, Custodial) storage of data.

But can you tell how that works, i.e. which Capacity serves which
policy? This is another case where our mind tends to think Custodial ->
tape -> Nearline, but intrinsically it doesn't have to be like that.

> | Since a StorageEnvironment may contain multiple StorageCapacities,
> | it may describe a heterogeneous environment.  An example of this is
> | "tape storage", which has both tape back-end and disk front-end into
> | which users can pin files.  Such a StorageEnvironment would have two
> | associated StorageCapacities: one describing the disk storage and
> | another describing the tape.

But can you have more than one Capacity of the same type? (see the
comments earlier). Anyway I think we removed the storage type from the
Capability so at the moment you can't really tell what it is. Maybe we
should look back at the proposal for Storage Components made by Flavia,
Maarten et al in the 1.3 discussion, or has someone already done that? 

> | StorageCapacities associated with a StorageEnvironment must be
> | non-overlapping with any other such StorageCapacity and the set of
> | all such StorageCapacities must represent the complete storage
> | available to end-users.

Conceptually that may be true, but there's no guarantee that all of them
are actually published. You could also wonder about space which is
installed but not currently allocated to any VO ...

> | Nevertheless, the StorageCapacities associated with
> | StorageEnvironments may be incomplete as a site may deploy physical
> | storage devices that are not directly under end-user control; for
> | example, disk storage used to cache incoming transfers.  GLUE makes
> | no effort to record information about such storage.

Actually part of my reason to introduce Capacity objects is that they
can do just that if people want them to (as they may since it can be
useful to know about cache usage). For such cases the CapacityType would
be Cache, or maybe something else if you wanted to distinguish more than
one kind of cache. As always there's no compulsion to publish that if
you don't want it, but the schema makes it possible.

> | GLUE makes no attempt to record which physical storage (as
> | represented by StorageCapacity objects) is under control of which
> | StorageResource.

Should it? As it stands you might not care, but if you wanted to
consider monitoring use cases (whether the software is running at the
most basic!) it would probably be useful to know how that relates to the
actual storage.

> StorageShare:
> 
>   A StorageShare is a logical partitioning of one or more
>   StorageEnvironments.

Maybe I'm missing something, but how could you have more than one
Environment for a single Share? Certainly our current structure doesn't
allow it (one SA per many VOInfos but not vice versa), although as I
said above that might be misleading.

> | The StorageCapacities within the StorageShare context need not
> | describe all storage: the number of StorageCapacities associated
> | with a StorageShare may be less than the sum of the number of
> | StorageCapacities associated with each of the StorageShare's
> | associated StorageEnvironments.

Err, why? As always you may choose not to publish everything, but
conceptually the space is all there somewhere ...

> | A pair of StorageShares may be partially shared, that is, they have
> | at least one pair StorageCapacities that are shared and at least one
> | that is not.  Partially shared StorageCapacities could represent two
> | UserDomain's access to a tape store, where they share a common set
> | of disk pools but the tape storage is distinct.

I'm not sure I like this bit. In general I would assume that storage
(SAs in the current parlance) is either shared or not - allowing the
disk part of a custodial/online space to be shared and the tape part not
sounds rather weird to me, and I don't think that's how SRM works. Do we
really have such cases? Bear in mind that the point is not about sharing
the physical disks, but having a shared allocation (and for Disk1/Online
permanent storage, not cache). If the system is guaranteeing to store,
say, 100 Tb on both disk and tape (custodial/online) there is no way it
can do that if the disk part of the reservation is shared, and if it
doesn't guarantee it overall then having a reserved tape pool is
pointless, in general it would just mean that some tapes are unusable.

  Another question, what do we do about hierarchical spaces? At the
moment we at least have the case of the "base space" or whatever you
call it from which the space tokens are reserved, and in future I
believe we're considering being able to reserve spaces inside spaces.
How could that be represented? (There are also questions we've discussed
in the past about things like dynamic spaces and default spaces which
tend to produce more heat than light :)

> StorageMappingPolicy:
> 
>   The StorageMappingPolicy describes how a particular UserDomain is
>   allowed to access a particular StorageShare.

Should we say how this relates to the AccessPolicy? (which doesn't seem
to appear explicitly in either the Computing or Storage diagrams but is
presumably there anyway.)

> | No member of a UserDomain may interact with a StorageShare except as
> | described by a StorageMappingPolicy.

As stated I don't think that can really be true, the SRM could
potentially allow all kinds of things not explicitly published. The
things which should be true are that there is an agreed set of things
(maybe per grid?) which are published, and that the published values
should be a superset of the "real" permissions - i.e. the SRM may in
fact not authorise me even if the published value says that it will, but
the reverse shouldn't be true.

> | The StorageMappingPolicies may contain information that is specific
> | to that UserDomain, such as one or more associated
> | StorageCapacities.  If provided, these provide a UserDomain-specific
> | view of their usage of the underlying physical storage technology as
> | a result of their usage within the StorageShare.

I don't think I understand how this can be different from the Share to
Capacity relation ... if you are saying that the Share can be multi-VO
then I think something has gone wrong somewhere given that the Path and
Tag can be VO-specific. In the 1.3 schema the whole point of the VOInfo
(which has become the Share) was to split out the information specific
to each mapping policy (ACBR) from the generic information in the SA ...

> | The access policies describing which users of a UserDomain may use
> | the StorageEndpoint are not published.

Are you sure? (see comment above)

>   A StorageAccessProtocol describes one method by which end-users may
>   sent data to be stored, received stored data, or undertake both
>   operations.

sent -> send, received -> retrieve

> | Access to the interface may be localised; that is, only available
> | from certain computers.  It may also be restricted to specified
> | UserDomains.

It might also only apply to certain storage components ... 

Phew .. I spent over two hours writing that, I hope someone reads it :)

Stephen