[saga-rg] proposal for extended file IO
Andre Merzky
andre at merzky.net
Wed Jun 15 05:59:51 CDT 2005
Quoting [John Shalf] (Jun 14 2005):
>
> On Jun 14, 2005, at 10:12 AM, Andrei Hutanu wrote:
> >I support this model in the more generic way I proposed ..
> >It's a more accurate model because you explicitely
> >submit a job when you are doing (potentially) complex operations
> >and it allows for better performance.
> >
> >I think John is right when saying that eread is tricky and hard to use,
> >I also agree with Andre that a simpler model is even more limiting
> >than eread.
>
> I agree with that statement completely. I think pread/readv is a bit
> *too* simple. However, we should look at some more elemental
> interfaces for describing patterned read/write operations. I'm quite
> interested in following up on some of the leads that Thorsten gave us
> for instance.
If you don't mind, I can give you a short version of the
technique Thorsten refers to.
Assume you have binary data which are regularily structured,
e.g. an rgb image of resolution x*y. With resolution 6*4
that looks like:
rgb rgb rgb rgb rgb rgb
rgb rgb rgb rgb rgb rgb
rgb rgb rgb rgb rgb rgb
rgb rgb rgb rgb rgb rgb
Or, as file stream:
rgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgb
each element (r, g or b) being two byte for example.
No, consider the request for a subsampled and downsampled
version of that image: at offset (1,1) you want an image 2*2
with half resolution
11 111 111
012 345 678 901 234 567
0 --- --- --- --- --- ---
1 --- RGB --- RGB --- ---
2 --- --- --- --- --- ---
3 --- rgb --- rgb --- ---
LS (Line Segments) can be used to descibe a single rgb
triplet to be read. For example, the first RGB above is:
(l,r) = (3,5)
l: left-most byte -> 3
r: righ-most byte -> 5
FALLS (family of line segments) can be used to describe a
pattern . The line of RGBs above can be described as:
(l,r,s,n) = (3,5,6,2)
l: left-most byte -> 3
r: righ-most byte -> 5
s: stride between two consecutive l elements -> 6
n: number of consecutive line segments -> 2
Falls can be nested. The another parameter is added to the
set, which is in turn a fall. So the above subsampled
subsetted image would be:
(1,1,2,2,(3,5,6,2))
That gives a sequence of FALLS, starting at line 1 (not 0),
ending at line 1, repeating with stride 2, for 2 times.
You see, that maps pretty well to hyperslabs in HDF5, but
fits basically all regularily structured binary data.
I does obviously not work for compressed data, unstructured
data etc.
For reference, see:
F. Isaila and W. Tichy. Clusterfile: A flexible physical
layout parallel file system. Proceedings of IEEE Cluster
Computing Conference, October 2001.
Thorsten, Andrei and I implemented that one for remote file
access, and called it pread (pattern_read), which worked
nice indeed. Obviously, its up to taste if the pattern gets
specified as string or recursive data structure...
Cheers, Andre.
--
+-----------------------------------------------------------------+
| Andre Merzky | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science | mail: merzky at cs.vu.nl |
| De Boelelaan 1083a | www: http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands | |
+-----------------------------------------------------------------+
More information about the saga-rg
mailing list