[saga-rg] proposal for extended file IO

Andre Merzky andre at merzky.net
Wed Jun 15 05:59:51 CDT 2005


Quoting [John Shalf] (Jun 14 2005):
> 
> On Jun 14, 2005, at 10:12 AM, Andrei Hutanu wrote:
> >I support this model in the more generic way I proposed ..
> >It's a more accurate model because you explicitely
> >submit a job when you are doing (potentially) complex operations
> >and it allows for better performance.
> >
> >I think John is right when saying that eread is tricky and hard to use,
> >I also agree with Andre that a simpler model is even more limiting 
> >than eread.
> 
> I agree with that statement completely.  I think pread/readv is a bit 
> *too* simple.  However, we should look at some more elemental 
> interfaces for describing patterned read/write operations.  I'm quite 
> interested in following up on some of the leads that Thorsten gave us 
> for instance.

If you don't mind, I can give you a short version of the
technique Thorsten refers to.

Assume you have binary data which are regularily structured,
e.g. an rgb image of resolution x*y.  With resolution 6*4
that looks like:

   rgb rgb rgb rgb rgb rgb
   rgb rgb rgb rgb rgb rgb
   rgb rgb rgb rgb rgb rgb
   rgb rgb rgb rgb rgb rgb

Or, as file stream:

   rgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgbrgb

each element (r, g or b) being two byte for example.

No, consider the request for a subsampled and downsampled
version of that image: at offset (1,1) you want an image 2*2
with half resolution

                11 111 111
   012 345 678 901 234 567 
   
 0 --- --- --- --- --- ---
 1 --- RGB --- RGB --- ---
 2 --- --- --- --- --- ---
 3 --- rgb --- rgb --- ---

LS (Line Segments) can be used to descibe a single rgb
triplet to be read.  For example, the first RGB above is:

 (l,r) = (3,5)
  l: left-most byte -> 3
  r: righ-most byte -> 5


FALLS (family of line segments) can be used to describe a
pattern .  The line of RGBs above can be described as:

 (l,r,s,n) = (3,5,6,2)
  l: left-most byte                            -> 3
  r: righ-most byte                            -> 5
  s: stride between two consecutive l elements -> 6
  n: number of consecutive line segments       -> 2


Falls can be nested.  The another parameter is added to the
set, which is in turn a fall.  So the above subsampled
subsetted image would be:

  (1,1,2,2,(3,5,6,2))

That gives a sequence of FALLS, starting at line 1 (not 0),
ending at line 1, repeating with stride 2, for 2 times.

You see, that maps pretty well to hyperslabs in HDF5, but
fits basically all regularily structured binary data.

I does obviously not work for compressed data, unstructured
data etc.

For reference, see:

  F. Isaila and W. Tichy. Clusterfile: A flexible physical
  layout parallel file system. Proceedings of IEEE Cluster
  Computing Conference, October 2001.

Thorsten, Andrei and I implemented that one for remote file
access, and called it pread (pattern_read), which worked
nice indeed.  Obviously, its up to taste if the pattern gets
specified as string or recursive data structure...

Cheers, Andre.

-- 
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky at cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+





More information about the saga-rg mailing list