[UR-WG] Draft/suggestion for Storage Accounting Record (2010/10/07)

Thu Oct 28 04:35:43 CDT 2010

Hi John

On Wed, 27 Oct 2010, john.gordon at stfc.ac.uk wrote:

>> Result:
>> "StartTime" and "EndTime" element for describing the time interval for
>> when the
>> record is valid. Both of them are DateTime values.
>
> I don't understand. A measurement is taken at a single point in time. If 
> I look at the space used in a storage system (say a filesystem) What 
> information do I have on the period of validity of the measurement.
>
> I suppose if one is creating a summary record in place of many single 
> measurements then one could understand the meaning of start time and end 
> time but then one would have to think about what one was doing with the 
> other quantities measured - min, max, average over the time span?

Well, you are right that these things are measured as a point in time. The 
idea with start and end time was to allow arbitrary time resolution, but 
this is possible with timestamps as well. Performing multiple samples to 
create min,max,avg would also be possible, but the gains are fairly low 
IMO.

I don't really have any strong opinions on which way to go here. Both 
should be able to work fine.

>> We quickly decided on 
>> using bytes, as it is the fundamental unit, and saves us from deciding 
>> on weather to use 1000 or 1024 bytes as a base.
>
> Don't you risk overflowing an integer in some systems?

I don't think we should worry about that in standards. Most languages and 
databases support 64 bit integers, don't be afraid to use it. Also if an 
accounting system want to transfer the number into something else 
internally, I'm perfectly fine with it.

>> We briefly considered if reporting should be per file, but this was 
>> quite quickly shoot down, as it would make the records unreasonably 
>> large, without providing any real value. We did end with an element 
>> describing the number of files, which are using the space reported.
>
> I presume you mean the number of records would be large, not the individual records?

Yes.

>> Result: "UsedSpace", "ReservedSpace", "UnAllocatedSpace" metrics for 
>> describing how space is used, reserved, and available for use. Reserved 
>> and unallocated are probably not overlapping (as reserved space is 
>> technically used). The measurement is in bytes. "FileCount" for 
>> describing the number of files using the space.
>
> Filecount should be optional.

Absolutely (and so should reserved space and unallocated/available space).

Most of the suggested fields in the draft should be optional.

>> A site is considered a top level container for storage. The site name 
>> should globally unique (which probably means an FQDN).
>
> What is the FQDN of your site? Isn't a FQDN a host?
>
> In EGI we have a site name defined in GOCDB which is also used in the 
> GLUE Schema. I assume other users of the GLUE schema have an equivalent. 
> I know OGF odes.

Yes, FQDN is a full hostname. The important thing is that it globally 
identifies your host or site. The idea was that if you had a host name 
like, spacebucket1.mysite.org, you could use mysite.org. This would enable 
adding numbers from different SEs on a site to be added together. Perhaps 
it should be considered to split this into two fields: site, and host.

>> A storage system is an independent system on a site.
>> A storage system partition is a part of storage system (similar to
>> dCache pools)
>> A storage type describes the storage type where the data is stored
>> (disk, tape)
>
> WLCG uses Storage Area.  definition

> The Glue Storage Area (SA) class describes a logical view of a portion 
> of physical space that can include disks and tape resources. SAs MAY 
> overlap. Shared portions of storage MUST be represented with a single 
> GlueSA object, with multiple GlueSAAccessControlBaseRule attributes and 
> optionally with multiple VOInfo objects pointing to it.

A somewhat complex approach, but it might be a possible strategy.

>> How exactly to describe the virtual organization is not quite clear, 
>> but the following elements are needed as a base: VO name, VO issuer (DN 
>> of the VO issuer, somewhat VOMS specific), VO group, and VO role. There 
>> might be use cases for being able to have multiple VO blocks (though I 
>> suspect that will be messy).
>
> I think FQAN is the term for this. I don't see the need for VO Issuer. 
> VO names should be unique in any infrastructure and now that we 
> typically register fully qualified VO names they should be globally 
> unique.

"Typically" is not quite good enough here. Nothing prevents to VOMS 
servers to create a VO with the same name, so I think it is absolutely 
necessary. It won't be needed in most cases, but if it does happen, you 
want it.

I think we should also try to make it usuable by non-VOMS users as way of 
globally identifying projects.

>> 8. Sharing Elements with Usage Record Standard
>>
>> Some of the elements are identical in both name and semantics to the 
>> ones in usage record. We do not suggest to share the elements as such 
>> (same namespace), as it would make the standard rely on the UR 
>> standard, and hence make it less self-contained. The UR standard is 
>> only used in a few systems, and is likely to be replaced with a new 
>> standard sometime. Furthermore the implementation gains of sharing the 
>> names are very small, if they even exist.
>
> Since there is an existing UR I do not see a problem using the same 
> names where the meaning is shared. This would not tie us to syncing with 
> any new versions of the UR.

Not quite sure, I'm following here. Are suggesting to just reuse the 
element name where it makes sense (which I'm perfectly fine with), or 
reuse the QName (namespace + element name) from UR. The latter just seems 
as an unnecessary complication.

Thanks for the feedback. Hopefully people will have to read it before the 
meeting.

     Best regards, Henrik

  Software Developer, Henrik Thostrup Jensen <htj at ndgf.org>
  Nordic Data Grid Facility. WWW: www.ndgf.org