[glue-wg] feedback on storage entities

Thu Apr 10 05:16:32 CDT 2008

Hi  Maarten,

>> * For Storage Share
>> 1- add a shared attribute in the storage share which type is boolean; 
>> for "shared" shares, the value should be true
>> 2- add an AggregationLocalID attribute; for the "shared" shares within 
>> the same storage service, this attribute should be assigned with the 
>> same value
>>
>> in this way, we avoid the creation of one more level of hierarchy and 
>> potential visualization tools which want to show a summary info can 
>> avoid double counting by checking the two attributes that we propose
>>     
>
> So, you would publish such a shared Share multiple times, once per VO.
> Each such instance then gives a VO view of that Share.  I do not see a
> problem for the info provider to cook up the correct values for the
> boolean flag and the AggregationLocalID, but I do note that compared
> to the proposal by Felix we lose some functionality: if each of the
> VOs has a _quota_ in the Share, we would publish that number as, say,
> its online TotalSize --> this means we no longer have the _physical_
> TotalSize of the Share published anywhere.  Maybe not a big loss...
>   

so, you mean that within a "Shared Share", a VO could have a Quota?

This reminds me a really "old" discussion we had in 2003. Have a look at 
this page, almost at the end "Contributions":

http://www.cnaf.infn.it/~andreozzi/datatag/glue/working/SE/index.html

what do you think of it?

Cheers, Sergio

http://www.cnaf.infn.it/~andreozzi/datatag/glue/working/SE/index.html
>   
>> * For Storage Environment:
>> when we mapped the current model to our T1 use case, we found out that 
>> the storage environment is homogeneous; therefore there is not need (at 
>> least for our scenario) to have the capacity to be associated to the 
>> storage environment; the attributes of the storage capacity can be added 
>> to the storage environment
>>     
>
> An Environment can have both online and nearline components, and we would
> like to be able to publish sizes for both: if the sizes are incorporated,
> we have to put Online and Nearline in their names, like in GLUE 1.3.
> Fine with me, but I thought there were objections against that?
>
>   
>> * For Storage Resource:
>> since information about free/used/total/reserved space is provided by 
>> the storage environment, we could avoid to have summary info at the 
>> storage resource level; information consumer can aggregate it
>>     
>
> The assumption then is that Environments will not overlap: probably OK.
>
>   
>> If the above considerations fit the use cases of other partners, then 
>> the storage capacity would be related only to the storage share.
>>     
>
> I think we should handle sizes the same way for Share and Environment:
> either incorporate them, or have them in Capacity objects.
>
>   
>> As regards the today agenda, I removed the following issues since they 
>> do not properly reflect our scenario .
>>
>> ** consequence of overlapping StorageResource entities
>> *** GPFS 3.1 and GPFS 3.2 share same disks
>> *** if wished to be expressed explicitly -> each GPFS is represented as 
>> own StorageResource
>> *** BUT then : a higher aggregation of capacity numbers muster be given 
>> in the service (again: if wished)
>> *** OR (easier): express GPFS 3.1 and 3.2 in OtherInfo field
>>
>> in our mapping choice, we have decided to model the three storage 
>> systems managed by GPFS 3.1, GPFS 3.2 and TSM respectively using the 
>> storage environment concept. They do not logically overlap. (See here 
>>     
>
> Note: you do not publish the actual implementation names and versions,
> which we want at least for WLCG (see below).
>
> Furthermore, as far as WLCG is concerned you cannot build your T1D1
> setup out of a replica-online and a custodial-nearline Environment!
>
> In WLCG combinations of RetentionPolicy and AccessLatency have _extra_
> meaning that cannot be deduced from those attributes.
>
> Such combinations are called Storage Classes:
>
>     Custodial-Nearline == T1D0 --> disk managed by system
>     Custodial-Online   == T1D1 --> disk managed by client
>     Replica-Online     == T0D1 --> disk managed by client
>
> A Storage Class always has disk, i.e an online component, while the
> Custodial classes also have tape or some other high quality storage;
> if it is tape/dvd/... there is a corresponding nearline component.
>
> What is more, the disk component is managed by the system for T1D0,
> while it is managed by the client (VO) for T1D1 and T0D1.
>
> WLCG needs to have it clear from the schema which Storage Class applies
> to a particular Share.
>
> In principle one could come up with this rule:
>
>     Custodial-Nearline + Replica-Online == Custodial-Online
>
>     T1D0               + T0D1           == T1D1
>
> But then a client that is interested in T1D1 has to query for Shares
> that either are linked to a Custodial-Online Environment, or linked
> to a Custodial-Nearline _and_ a Replica-Online Environment: not nice!
>
> Furthermore, a client interested in T1D0 (T0D1) has to ensure that
> the matching Shares are _not_ also linked to T0D1 (T1D0).
>
> I would quite prefer having a Share always linked to a _single_
> Environment, which itself will have an online component and may also
> have a nearline component.
>
> If we want to have separate implementation names and versions for
> those components, it would seem natural to introduce the split at
> the Resource level instead: an Environment can be linked to an online
> Resource (e.g. with GPFS 3.2) and a nearline Resource (TSM X.Y.Z).
>
> Whichever way, we would like to publish such back-end implementation
> names and versions explicitly.  Flavia has a use case: the name and
> version of the back-end storage system should be available, such that
> it can be deduced from the information system which sites are likely
> to suffer from which open issues.  This is important information for
> debugging operational problems in WLCG (and other grids).
>
>   
>> http://glueman.svn.sourceforge.net/viewvc/*checkout*/glueman/tags/glue-xsd/draft-29/examples/AdminDomain_CNAF.xml?revision=27)
>> In our scenario, we have one global storage resource composed by three 
>> storage environments.
>>
>> As a final comment, my opinion is that we should privilege simplicity 
>> and the meta-scheduling use cases more than the monitoring ones. If we 
>> do not manage to converge shortly on a common vision for the storage 
>> resource/storage environment, we should probably postpone the definition 
>> of these entities to a future GLUE  revision and concentrate on the 
>> storage endpoint/storage share consolidation.
>>     
>
> I still think we are approaching convergence.
> Thanks,
> 	Maarten
>
>   

-- 
Sergio Andreozzi
INFN-CNAF,                    Tel: +39 051 609 2860
Viale Berti Pichat, 6/2       Fax: +39 051 609 2746
40126 Bologna (Italy)         Web: http://www.cnaf.infn.it/~andreozzi