[UR-WG] Fwd: Re: Fwd: EMI StAR – Definition of a Storage Accounting Record - ready for public comments

john alan kennedy jkennedy at rzg.mpg.de
Mon Mar 26 01:54:57 EDT 2012


Hi all,

I asked Jean Yves Nief who's very active in the iRODS scene and works at 
IN2P3 where they have a large iRODS based data store to comment on the 
StAR doc.

Here's his feedback.

cheers
johnk



-------- Original Message --------
Subject: 	Re: Fwd: [UR-WG] EMI StAR – Definition of a Storage Accounting 
Record - ready for public comments
Date: 	Tue, 20 Mar 2012 18:03:16 +0100
From: 	Jean-Yves Nief <nief at cc.in2p3.fr>
To: 	john alan kennedy <jkennedy at rzg.mpg.de>



dear John,

             I finally had time to take a look at your document. I do
not have much comments at this point as I don't know the exact context
of the discussion.
In 5.18 (ResourceCapacityUsed), it is said that this information should
contain the space used for redundancy in RAID setups for example. At the
middleware level, I am not sure if this information is totally relevant.
The important thing at the middleware level is the amount of disk space
available for the grid users. The real amount of disks installed for
example is some kind of internal cooking. On the other hand, it might be
interesting to have some metadata information wrt the level of data
security of a given storage resource (RAID level etc...). A storage
resource could be less capacitive than an other one but provide more
data security on the other hand: it could be an element used in the
assessment of the level of service performed by a given storage resource
(and not only the amount of space provided). Also the latency to
retrieve the data should be documented (is it online, "nearline",
offline ?), that's important when you are dealing with hybrid system
with both storage resources and tapes for example.
In section 6 (Intentionally Left Out Properties): if the site id can be
found somewhere else, that's fine. However, having infos spread over
several information systems might be a bit dangerous. And one way or an
other, you need to link what is available in a storage system (or part
of it) with a given site.
The transfer information is said to be related to network resources,
hence not in the scope of this document. However, it is an important
feature. For example, serving 30 TBs of data for thousands of users is
not going to be the same as having 30 TBs for archival purposes which is
going to be accessed once in a while. Ie, the amount of transfer is
going to have some consequences on your storage system design solutions
(both hardware and software) and not only on the network resource, again
the investment and the cost are not going to be the same. If you take
the case of Amazon S3, they are charging also for the network usage:
this is not separated from storage usage as it is defined in this
document. Also, if you have an hybrid system, the amount of data
transfer is directly connected to the number of cache disk that you
have, the number of tape drives etc...
cheers,
JY



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/ur-wg/attachments/20120326/7856a4e4/attachment.html>


More information about the ur-wg mailing list