[UR-WG] Draft/suggestion for Storage Accounting Record (2010/10/07)

Andrea Cristofori andrea.cristofori at cnaf.infn.it
Fri Oct 22 07:30:34 CDT 2010


Hi Henrik,

>> It would be nice to see how it fits with other (non-grid) storage systems.
>>      
> Yes. With the exception of the VO information most of it should be usable,
> especially with fields for local user name and groups, though this might
> be an oversimplification in some places.
>
> It is certainly also possible to imagine systems where the record would
> not be usuable, but questions is more what and how to create records for
> systems which have a more dynamic structure, and perhaps with not-so-clear
> ownership of data.
>    

In those cases maybe the owner wouldn't be present but the storage area, 
or something like that, will be. Somehow someone is paying for the space 
utilized. Maybe it could be problematic if a storage is shared without 
clear boundaries between multiple groups.

> Last week we had a meeting in the EMI SAR group, and it became very clear
> that one of the most important things in creating this record format is a
> vision of what should be in record and what shouldn't be.
>
> More specifically we got into a discussion if the record should just
> reported used/reserved/available storage, or if it should include transfer
> statistics, and if so, how detailed. Having transfer statistics for a
> storage element (or per storage partition) could be very interesting.
> Having it per file, could be even more interestesting, but would create
> gigantic record or a very large amount of them.
>    

I think that it would be very interesting to have also the possibility 
to include transfer details. It's true that it could lead to a huge 
amount of records but, also in this case, they could be aggregated in a 
per file or area, or group basis.


>    
>> Regarding creating a stand alone storage accounting record. I think there are
>> good and bad points here.
>>
>> Good: It may allow you to get a prototype with less delay which you could
>> then use and feedback your experience towards the evolution of a full
>> standard. Maybe it even is enough to do this for your EMI work.
>>
>> Bad: Splitting away from the UR standard causes issues when you want to
>> combine compute/storage accounting. The initial idea from OGF, dating back to
>> OGF 21, was that the UR would be formed from different elements e.g.
>> Compute/Storage/Network. The record would effectively aggregate the different
>> parts of the accounting information. I attached the UR2 talk from Donal
>> Fellows at OGF21, I think the UR2 Zoo slide is a nice vision to have.
>>      
> You are right about both. Jon and me considered this when we created the
> suggestion, but decided against creating something unified as it would
> most likely bump up the timeschedule significantly. EMI needs a record
> format at the end of this year. Getting UR2 ready in to months is probably
> not going to happen.
>
>    

At least, as I was suggesting it could use, where possible, the same 
name and description for fields that have the same meaning. If EMI 
decide to proceed with a separated UR this might at least help in 
compatibility with lets say UR2 that include computation and storage.


>> Now maybe this is doable and maybe it's a pain but I think it'd be nice to
>> try.
>> Much of the information regarding the users/vo(community)/site would be the
>> same and would fit into the core.
>>      
> Yes. In idea that surfaced at the meeting was to create a seperate
> standard for an "identity block" and use it in the storage record.
> Furthermore this identity block could be adopted by the UR standard,
> creating a UR 1.1 standard. This would fix what is probably the greatest
> achilles heel of the UR standard - that it doesn't provide a good way to
> describe VO information. Anyway, it is an idea.
>
>    
>> One other (smaller) issue that I had was replicas. In some storage systems
>> you can have the same file replicated multiple times to improve access,
>> dCache can do this and I would be surprised if iRODS couldn't do it too. How
>> do you think we should account for this, simply treat the files as separate
>> or flag this storage as a different type. It may not be so pressing quite yet
>> since I don't think it is so so frequently done and so could be a very small
>> fraction of the storage.
>>      
> The suggestion should be able to describe this using the "StorageClass"
> attribute, which can denote that something is a replica. However it is not
> alwasy easy to say what is an original and a replica (and gets marked as
> such). This falls into something similar to reserved space, which could be
> freed, but is somehow occupied.  Something that would also be nice to know
> is if the replica is there for fault-tolerance or because of high traffic
> to file (i.e., why is the file replicated).
>
>    

I think that the replica, as you said, might be well described with the 
"StorageClass" attribute. Then it is up to the site to decide what to do 
with replicas. I would say that anyway replicas are like normal file so 
they should be treated like that but it's left to the site to decide. 
Depending in wich storage they can be found they can also be treated 
differently. Also the reason why the replica exists, I think, is not a 
concern of the UR. Maybe it will help to have the information about the 
access to the file so that the existence of the replicas can became 
apparent or not.


>> We should also start to think about what is required on the backend from the
>> storage solution providers to allow us to gather/use the accounting info you
>> want. I think many systems would be able to give an answer to "what is in
>> your system now" but I am not sure how it is when we ask "what was in your
>> system between X and Y 2007". I think you have good contact with several
>> providers and can ask right? This also opens up questions to what we account
>> for bytes v's byte-mins? If you are really talking about an integration over
>> a time period this is what it would amount to. We had some discussion
>> regarding the snapshot/integration - and I think we may have some more :-)
>>      
> I have a (very) good connection to a dCache developer:
>
> In dCache, it is not possible to ask how much a person/project used at a
> certain time. However there are very detailed logs for how much have
> been written, read and deleted per pool. It is however logged to log files
> which must be parsed in order to acquire the information.
>    

I remember I've read that dCache could also write those information in a 
database. If this is true it might be even easier to extract the 
required informations to create a UR.

Andrea


> For inquiries about other storage systems, Paul Millar from the EMI SAR
> group is probably the best person to poke as he has contacts for the
> groups and is collecting requirements from them.
>
>    
>> As I said - just first quick thoughts.
>>      
> Thanks for the feedback. We now have more open questions :-).
>
>
>       Best regards, Henrik
>
>    Software Developer, Henrik Thostrup Jensen<htj at ndgf.org>
>    Nordic Data Grid Facility. WWW: www.ndgf.org
> --
>    ur-wg mailing list
>    ur-wg at ogf.org
>    http://www.ogf.org/mailman/listinfo/ur-wg
>    


-- 
Andrea Cristofori
INFN-CNAF
Viale Berti Pichat 6/2
40127 Bologna
Italy
Tel. : +39-051-6092920
Skype: andrea-cnaf



More information about the ur-wg mailing list