[UR-WG] Draft/suggestion for Storage Accounting Record (2010/10/07)
Henrik Thostrup Jensen
htj at ndgf.org
Thu Oct 28 04:35:43 CDT 2010
Hi John
On Wed, 27 Oct 2010, john.gordon at stfc.ac.uk wrote:
>> Result:
>> "StartTime" and "EndTime" element for describing the time interval for
>> when the
>> record is valid. Both of them are DateTime values.
>
> I don't understand. A measurement is taken at a single point in time. If
> I look at the space used in a storage system (say a filesystem) What
> information do I have on the period of validity of the measurement.
>
> I suppose if one is creating a summary record in place of many single
> measurements then one could understand the meaning of start time and end
> time but then one would have to think about what one was doing with the
> other quantities measured - min, max, average over the time span?
Well, you are right that these things are measured as a point in time. The
idea with start and end time was to allow arbitrary time resolution, but
this is possible with timestamps as well. Performing multiple samples to
create min,max,avg would also be possible, but the gains are fairly low
IMO.
I don't really have any strong opinions on which way to go here. Both
should be able to work fine.
>> We quickly decided on
>> using bytes, as it is the fundamental unit, and saves us from deciding
>> on weather to use 1000 or 1024 bytes as a base.
>
> Don't you risk overflowing an integer in some systems?
I don't think we should worry about that in standards. Most languages and
databases support 64 bit integers, don't be afraid to use it. Also if an
accounting system want to transfer the number into something else
internally, I'm perfectly fine with it.
>> We briefly considered if reporting should be per file, but this was
>> quite quickly shoot down, as it would make the records unreasonably
>> large, without providing any real value. We did end with an element
>> describing the number of files, which are using the space reported.
>
> I presume you mean the number of records would be large, not the individual records?
Yes.
>> Result: "UsedSpace", "ReservedSpace", "UnAllocatedSpace" metrics for
>> describing how space is used, reserved, and available for use. Reserved
>> and unallocated are probably not overlapping (as reserved space is
>> technically used). The measurement is in bytes. "FileCount" for
>> describing the number of files using the space.
>
> Filecount should be optional.
Absolutely (and so should reserved space and unallocated/available space).
Most of the suggested fields in the draft should be optional.
>> A site is considered a top level container for storage. The site name
>> should globally unique (which probably means an FQDN).
>
> What is the FQDN of your site? Isn't a FQDN a host?
>
> In EGI we have a site name defined in GOCDB which is also used in the
> GLUE Schema. I assume other users of the GLUE schema have an equivalent.
> I know OGF odes.
Yes, FQDN is a full hostname. The important thing is that it globally
identifies your host or site. The idea was that if you had a host name
like, spacebucket1.mysite.org, you could use mysite.org. This would enable
adding numbers from different SEs on a site to be added together. Perhaps
it should be considered to split this into two fields: site, and host.
>> A storage system is an independent system on a site.
>> A storage system partition is a part of storage system (similar to
>> dCache pools)
>> A storage type describes the storage type where the data is stored
>> (disk, tape)
>
> WLCG uses Storage Area. definition
> The Glue Storage Area (SA) class describes a logical view of a portion
> of physical space that can include disks and tape resources. SAs MAY
> overlap. Shared portions of storage MUST be represented with a single
> GlueSA object, with multiple GlueSAAccessControlBaseRule attributes and
> optionally with multiple VOInfo objects pointing to it.
A somewhat complex approach, but it might be a possible strategy.
>> How exactly to describe the virtual organization is not quite clear,
>> but the following elements are needed as a base: VO name, VO issuer (DN
>> of the VO issuer, somewhat VOMS specific), VO group, and VO role. There
>> might be use cases for being able to have multiple VO blocks (though I
>> suspect that will be messy).
>
> I think FQAN is the term for this. I don't see the need for VO Issuer.
> VO names should be unique in any infrastructure and now that we
> typically register fully qualified VO names they should be globally
> unique.
"Typically" is not quite good enough here. Nothing prevents to VOMS
servers to create a VO with the same name, so I think it is absolutely
necessary. It won't be needed in most cases, but if it does happen, you
want it.
I think we should also try to make it usuable by non-VOMS users as way of
globally identifying projects.
>> 8. Sharing Elements with Usage Record Standard
>>
>> Some of the elements are identical in both name and semantics to the
>> ones in usage record. We do not suggest to share the elements as such
>> (same namespace), as it would make the standard rely on the UR
>> standard, and hence make it less self-contained. The UR standard is
>> only used in a few systems, and is likely to be replaced with a new
>> standard sometime. Furthermore the implementation gains of sharing the
>> names are very small, if they even exist.
>
> Since there is an existing UR I do not see a problem using the same
> names where the meaning is shared. This would not tie us to syncing with
> any new versions of the UR.
Not quite sure, I'm following here. Are suggesting to just reuse the
element name where it makes sense (which I'm perfectly fine with), or
reuse the QName (namespace + element name) from UR. The latter just seems
as an unnecessary complication.
Thanks for the feedback. Hopefully people will have to read it before the
meeting.
Best regards, Henrik
Software Developer, Henrik Thostrup Jensen <htj at ndgf.org>
Nordic Data Grid Facility. WWW: www.ndgf.org
More information about the ur-wg
mailing list