[glue-wg] Datastore proposal
Paul Millar
paul.millar at desy.de
Fri Apr 11 09:39:53 CDT 2008
Hi Stephen,
On Friday 11 April 2008 14:47:10 Burke, S (Stephen) wrote:
> Sergio Andreozzi [mailto:sergio.andreozzi at cnaf.infn.it] said:
> > [size attributes] can be directly added to the dataStore class.
>
> Yes. In fact the semantics may be a bit different too: I don't think
> there's any need for ReservedSize as that's related to the SRM internals
> and not the hardware, and similarly there's no need for a Cache type as
> that relates to how it's used.
Yep, that sounds sensible to me, although ReservedSize may be a useful concept
(see below).
> So I think my proposed attributes are:
>
> UniqueID: unique ID (or possibly we just need a LocalID?)
I think a LocalID should be sufficient.
> Name: Human-readable name (maybe indicating the technology, e.g.
> StorageTek)
Yes, in principle; although I'd be weary of suggestion people put technology
names into Name. I know this is just an example, but people tend to follow
examples. People may naturally choose the technology as a Name anyway and
there's nothing wrong with that. The problem, if it comes, would be that
people then expect the technology to be embedded in the Name field.
Perhaps we could suggest technology as something that might go in the
OtherInfo field?
(or is this just being too paranoid?)
> Type (disk, tape, ... - open enumeration) (or maybe call this attribute
> Medium?)
This is definitely nit-picking, but for many instances this would be
more "media". Using "type" would avoid the singular / plural issue (I
think).
That aside, either choice is OK.
> Latency: Enumeration {Online, Nearline, Offline} (probably no need to
> make this open?)
Do we define what online, nearline and offline mean somewhere?
> TotalSize: The total amount of data that can be stored (right now, e.g.
> regardless of whether tapes may be added to order, but ignoring the
> state so e.g. disk servers which are down still get counted).
Yes.
Perhaps the "total amount of data that can stored without operator
intervention and when operating correctly". Would that be sufficient?
> Note that this could be smaller than the underlying hardware capacity, e.g.
> with RAID the parity disks don't contribute to the size.
Aye. We should record the storage actually available to end-users, the ten
hot-spare disks shouldn't be recorded in that number.
> UsedSize: The total amount of data which is currently stored - this is
> physical data, so e.g. if there are currently three copies of a file for
> load-balancing then you count all of them.
Ah, and we get into a slightly contentious issue. So, with dCache, a disk
pool's storage can be logically split into five parts:
precious
precious&sticky
cache
cache&sticky
free
precious is the total size of all files that are to be stored on tape (but
where this hasn't happened yet), e.g., D0T1.
precious&sticky is the total size of all files that are to be stored on tape
(but, again, this hasn't happened yet). Once stored they should be "pinned".
e.g., D1T1.
cache is the total size of all files that can be deleted at any time.
cache&sticky is the total size of all files that cannot be deleted, e.g.,
pinned D0T1.
free is axiomatically the space not described by the other four categories,
i.e.: totalSize - (precious + precious&sticky + cache + cache&sticky).
Stephen, your description seems to map to precious + precious&sticky + cache +
cache&sticky. However, for most systems this should be ~100% of totalSize
most of the time, so I'm not sure how useful that number is.
Perhaps we can look at the free(1) command for some hints since they face a
similar problem. Here's an example output (I've edited it for clarity).
total used free buffers cached
Mem: 2074992 1760664 314328 225920 1193968
-/+ buffers/cache: 340776 1734216
This says that memory used for buffers and cache (1419888 in total, for this
example) is, in the first line, considered part of the used space. But, if
considered free, would result in the second line.
Perhaps we should publish two numbers? Or, we could publish a reservedSize
(corresponding to buffers+cached above). People add this number to either
the usedSize or the freeSize depending what they want to know.
> FreeSize: TotalSize - UsedSize, i.e. the free space at the filesystem
> level.
Is this an axiomatic relationship? If so, it probably isn't worth recording
it.
> OtherInfo: any other information, e.g. on the technology (RAID6, LTO,
> ...).
Yup, sounds good, provided it's optional information.
> One other point, I looked back at the StorageComponent proposal and that
> had a comment about hardware compression on tape drives. My initial
> feeling is that we should stick to real physical numbers here, e.g. you
> record a file as 2 Gb if it's that size on tape, even if it was 4 Gb
> before compression. Maybe we should have an extra attribute to indicate
> whether data may be compressed?
OK, I think this is a can-o-worms that we don't want to open. I had a chat
with our local tape people and here are some comments:
1. files are often compressed in the user-domain, if this is so then the
drive might disable compression altogether (it checks whether the
uncompressed version is less that the compressed size). If compressed, the
compression ratio is typically very low (~1%). Either cases for many files
the compressed size ~= actual size, so there's no real distinction. This is
strongly dependant on the file structure and, by implication on the
UserDomain.
2. for some tape systems, it would be very difficult to obtain the actual
storage usage (the "tape occupancy"?).
3. sometimes a file store operation can fail. If so, the tape software may
retry, but some (potentially unknown) fraction of the file has been written
to tape. Does this count towards to actual occupancy?
4. I believe Castor had an issue when deleting files (leading
to "repacking"?) If we're attempting to account for actual yardage of tape
used, how would this be accounted? [for disks, this is dealt with through
fragmentation]
I think the only thing we can publish is the (user-domain) file size that has
been recorded to tape. I believe this is the actual number people are
interested.
If a site has some cunning compression system so they can squeeze the files
into 1% of their original size, that's a site-local issue and shouldn't be
published in Glue.
(just my 2c worth).
Cheers,
Paul.
More information about the glue-wg
mailing list