[glue-wg] Datastore proposal

Fri Apr 11 09:39:53 CDT 2008

Hi Stephen,

On Friday 11 April 2008 14:47:10 Burke, S (Stephen) wrote:
> Sergio Andreozzi [mailto:sergio.andreozzi at cnaf.infn.it] said:
> > [size attributes] can be directly added to the dataStore class.
>
> Yes. In fact the semantics may be a bit different too: I don't think
> there's any need for ReservedSize as that's related to the SRM internals
> and not the hardware, and similarly there's no need for a Cache type as
> that relates to how it's used.

Yep, that sounds sensible to me, although ReservedSize may be a useful concept 
(see below).

> So I think my proposed attributes are: 
>
> UniqueID: unique ID (or possibly we just need a LocalID?)

I think a LocalID should be sufficient.

> Name: Human-readable name (maybe indicating the technology, e.g.
> StorageTek)

Yes, in principle; although I'd be weary of suggestion people put technology 
names into Name.  I know this is just an example, but people tend to follow 
examples.  People may naturally choose the technology as a Name anyway and 
there's nothing wrong with that.  The problem, if it comes, would be that 
people then expect the technology to be embedded in the Name field.

Perhaps we could suggest technology as something that might go in the 
OtherInfo field?

(or is this just being too paranoid?)

> Type (disk, tape, ... - open enumeration) (or maybe call this attribute
> Medium?)

This is definitely nit-picking, but for many instances this would be 
more "media".  Using "type" would avoid the singular / plural issue (I 
think).

That aside, either choice is OK.

> Latency: Enumeration {Online, Nearline, Offline} (probably no need to
> make this open?)

Do we define what online, nearline and offline mean somewhere?

> TotalSize: The total amount of data that can be stored (right now, e.g.
> regardless of whether tapes may be added to order, but ignoring the
> state so e.g. disk servers which are down still get counted).

Yes.

Perhaps the "total amount of data that can stored without operator 
intervention and when operating correctly".  Would that be sufficient?

> Note that  this could be smaller than the underlying hardware capacity, e.g.
> with RAID the parity disks don't contribute to the size.

Aye.  We should record the storage actually available to end-users, the ten 
hot-spare disks shouldn't be recorded in that number.

> UsedSize: The total amount of data which is currently stored - this is
> physical data, so e.g. if there are currently three copies of a file for
> load-balancing then you count all of them.

Ah, and we get into a slightly contentious issue.  So, with dCache, a disk 
pool's storage can be logically split into five parts:
	precious
	precious&sticky
	cache
	cache&sticky
	free

precious is the total size of all files that are to be stored on tape (but 
where this hasn't happened yet), e.g., D0T1.

precious&sticky is the total size of all files that are to be stored on tape 
(but, again, this hasn't happened yet).  Once stored they should be "pinned". 
e.g., D1T1.

cache is the total size of all files that can be deleted at any time.

cache&sticky is the total size of all files that cannot be deleted, e.g., 
pinned D0T1.

free is axiomatically the space not described by the other four categories, 
i.e.: totalSize - (precious + precious&sticky + cache + cache&sticky).

Stephen, your description seems to map to precious + precious&sticky + cache + 
cache&sticky.  However, for most systems this should be ~100% of totalSize 
most of the time, so I'm not sure how useful that number is.

Perhaps we can look at the free(1) command for some hints since they face a 
similar problem.  Here's an example output (I've edited it for clarity).

                        	       total	      used	       free	 buffers	  cached
Mem:               	2074992	1760664	  314328	225920	1193968
-/+ buffers/cache:	              	  340776	1734216

This says that memory used for buffers and cache (1419888 in total, for this 
example) is, in the first line, considered part of the used space.  But, if 
considered free, would result in the second line.

Perhaps we should publish two numbers?  Or, we could publish a reservedSize 
(corresponding to buffers+cached above).  People add this number to either 
the usedSize or the freeSize depending what they want to know.

> FreeSize: TotalSize - UsedSize, i.e. the free space at the filesystem
> level.

Is this an axiomatic relationship?  If so, it probably isn't worth recording 
it.

> OtherInfo: any other information, e.g. on the technology (RAID6, LTO,
> ...).

Yup, sounds good, provided it's optional information.

> One other point, I looked back at the StorageComponent proposal and that
> had a comment about hardware compression on tape drives. My initial
> feeling is that we should stick to real physical numbers here, e.g. you
> record a file as 2 Gb if it's that size on tape, even if it was 4 Gb
> before compression. Maybe we should have an extra attribute to indicate
> whether data may be compressed?

OK, I think this is a can-o-worms that we don't want to open.  I had a chat 
with our local tape people and here are some comments:

 1. files are often compressed in the user-domain, if this is so then the 
drive might disable compression altogether (it checks whether the 
uncompressed version is less that the compressed size).  If compressed, the 
compression ratio is typically very low (~1%).  Either cases for many files 
the compressed size ~= actual size, so there's no real distinction.  This is 
strongly dependant on the file structure and, by implication on the 
UserDomain.

 2. for some tape systems, it would be very difficult to obtain the actual 
storage usage (the "tape occupancy"?).

 3. sometimes a file store operation can fail.  If so, the tape software may 
retry, but some (potentially unknown) fraction of the file has been written 
to tape.  Does this count towards to actual occupancy?

 4.  I believe Castor had an issue when deleting files (leading 
to "repacking"?)  If we're attempting to account for actual yardage of tape 
used, how would this be accounted? [for disks, this is dealt with through 
fragmentation]

I think the only thing we can publish is the (user-domain) file size that has 
been recorded to tape.  I believe this is the actual number people are 
interested.

If a site has some cunning compression system so they can squeeze the files 
into 1% of their original size, that's a site-local issue and shouldn't be 
published in Glue.

(just my 2c worth).

Cheers,

Paul.