[UR-WG] Draft/suggestion for Storage Accounting Record (2010/10/07)

Henrik Thostrup Jensen htj at ndgf.org
Mon Oct 18 06:26:57 CDT 2010


Hi John

On Wed, 13 Oct 2010, john alan kennedy wrote:

> I think this is a really nice step forward.

Thanks.

> It would be nice to see how it fits with other (non-grid) storage systems.

Yes. With the exception of the VO information most of it should be usable, 
especially with fields for local user name and groups, though this might 
be an oversimplification in some places.

It is certainly also possible to imagine systems where the record would 
not be usuable, but questions is more what and how to create records for 
systems which have a more dynamic structure, and perhaps with not-so-clear 
ownership of data.

Last week we had a meeting in the EMI SAR group, and it became very clear 
that one of the most important things in creating this record format is a 
vision of what should be in record and what shouldn't be.

More specifically we got into a discussion if the record should just 
reported used/reserved/available storage, or if it should include transfer 
statistics, and if so, how detailed. Having transfer statistics for a 
storage element (or per storage partition) could be very interesting. 
Having it per file, could be even more interestesting, but would create 
gigantic record or a very large amount of them.

> Regarding creating a stand alone storage accounting record. I think there are 
> good and bad points here.
>
> Good: It may allow you to get a prototype with less delay which you could 
> then use and feedback your experience towards the evolution of a full 
> standard. Maybe it even is enough to do this for your EMI work.
>
> Bad: Splitting away from the UR standard causes issues when you want to 
> combine compute/storage accounting. The initial idea from OGF, dating back to 
> OGF 21, was that the UR would be formed from different elements e.g. 
> Compute/Storage/Network. The record would effectively aggregate the different 
> parts of the accounting information. I attached the UR2 talk from Donal 
> Fellows at OGF21, I think the UR2 Zoo slide is a nice vision to have.

You are right about both. Jon and me considered this when we created the 
suggestion, but decided against creating something unified as it would 
most likely bump up the timeschedule significantly. EMI needs a record 
format at the end of this year. Getting UR2 ready in to months is probably 
not going to happen.

> Now maybe this is doable and maybe it's a pain but I think it'd be nice to 
> try.
> Much of the information regarding the users/vo(community)/site would be the 
> same and would fit into the core.

Yes. In idea that surfaced at the meeting was to create a seperate 
standard for an "identity block" and use it in the storage record. 
Furthermore this identity block could be adopted by the UR standard, 
creating a UR 1.1 standard. This would fix what is probably the greatest 
achilles heel of the UR standard - that it doesn't provide a good way to 
describe VO information. Anyway, it is an idea.

> One other (smaller) issue that I had was replicas. In some storage systems 
> you can have the same file replicated multiple times to improve access, 
> dCache can do this and I would be surprised if iRODS couldn't do it too. How 
> do you think we should account for this, simply treat the files as separate 
> or flag this storage as a different type. It may not be so pressing quite yet 
> since I don't think it is so so frequently done and so could be a very small 
> fraction of the storage.

The suggestion should be able to describe this using the "StorageClass" 
attribute, which can denote that something is a replica. However it is not 
alwasy easy to say what is an original and a replica (and gets marked as 
such). This falls into something similar to reserved space, which could be 
freed, but is somehow occupied.  Something that would also be nice to know 
is if the replica is there for fault-tolerance or because of high traffic 
to file (i.e., why is the file replicated).

> We should also start to think about what is required on the backend from the 
> storage solution providers to allow us to gather/use the accounting info you 
> want. I think many systems would be able to give an answer to "what is in 
> your system now" but I am not sure how it is when we ask "what was in your 
> system between X and Y 2007". I think you have good contact with several 
> providers and can ask right? This also opens up questions to what we account 
> for bytes v's byte-mins? If you are really talking about an integration over 
> a time period this is what it would amount to. We had some discussion 
> regarding the snapshot/integration - and I think we may have some more :-)

I have a (very) good connection to a dCache developer:

In dCache, it is not possible to ask how much a person/project used at a 
certain time. However there are very detailed logs for how much have 
been written, read and deleted per pool. It is however logged to log files 
which must be parsed in order to acquire the information.

For inquiries about other storage systems, Paul Millar from the EMI SAR 
group is probably the best person to poke as he has contacts for the 
groups and is collecting requirements from them.

> As I said - just first quick thoughts.

Thanks for the feedback. We now have more open questions :-).


     Best regards, Henrik

  Software Developer, Henrik Thostrup Jensen <htj at ndgf.org>
  Nordic Data Grid Facility. WWW: www.ndgf.org


More information about the ur-wg mailing list