[saga-rg] proposal for extended file IO

Tue Jun 14 12:45:08 CDT 2005

Quoting [John Shalf] (Jun 14 2005):
> 
> On Jun 14, 2005, at 1:24 AM, Andre Merzky wrote:
> >Quoting [John Shalf] (Jun 14 2005):
> >>Should we find some case that causes problems for a readv/pread model?
> >>The hyperslabbing is clearly not one of those cases.
> >Actually, how would you do an HDF5 hyperslab via readv?  The
> >only way I see is instrumenting the HDF5 library, and write
> >a readv file driver - but then you would not use SAGA
> >anyway, that not application level anymore.
> 
> The same problem exists with any of the proposed solutions, including 
> eRead.  So I'm not sure if I see the point here.

Hm, sorry that I communicate so badly: but that CAN be
solved with eRead - and thats exactly the advantage.  We
implemented that once in a different lib, and it workd like
a charm. 

The code I included below is from a real client (the call
was named iowrap_pread instead of file.eRead though ;)

> >If you want to read hyperslabs on an HDF5 file on
> >application level with readv, you would need to mimic the
> >HDF5 lib in order to find the offset for the data set, and
> >would need to know details about HDF5 file structure and
> >data layout.
> 
> Someone will need to solve the very same problem in order to implement 
> an HDF5-specific eRead interface.
> 
> >Compared to that, eread really is simplier to the
> >application.  Here an example we used for hyperslabbing a
> >3D scalar field:
> >
> >  snprintf (pattern1, 255, "(%d, %d, %d, %d)"    , start1, stop1, 
> >stride1, reps1);
> >  snprintf (pattern2, 255, "(%d, %d, %d, %d, %s)", start2, stop2, 
> >stride2, reps2, pattern1);
> >  snprintf (pattern3, 255, "(%d, %d, %d, %d, %s)", start3, stop3, 
> >stride3, reps3, pattern2);
> >  res = file.eRead (pattern3, (char*) buf, buffer_size);
> 
> So you would actually need to embed this in-situ with your HDF5 code?  
> Or would you go through the HDF5 libraries so that you can push that 
> information string down to the driver layer?  Its not clear where 
> exactly you place these calls.

This call goes into the application!  That is supposed to be
the saga level.  The HDF5 lib does not come into play on the
local host at all, but only on the remote host - where the
eRead request is received, translated into a nativ HDF5 HS
read (translation is simple), and the resulting data are
returned.  That is why I think SAGA is a good place for
eRead - it IS application level...

> And when you *do* insert these calls, 
> it requires some understanding of the HDF5 internal file layout. Or are 
> we going to ditch the HDF5 API and use eRead instead?  How then do we 
> use eRead to manage all of the other HDF5 features like compression, 
> groups, iteration etc.???  What is the string spec for an HDF5 group 
> iterator using eRead strings?

Ah, right, now I see why we are running circles :-)

Imagine a remote web service providing access to HDF5 files.
A simple version would provide read and write call only, a
more sophisticated version would provide group iterations
etc.  However, the service would come up with some
interface, which resembles HDF5 somewhat, but is probably
more taylored toward the specific use case.

eRead is nothing but a medium to communicate with such a
service, and with similar services.  It cannot replace HDF5,
but can help in _application specific_ usage of a service
providing access to an HDF5 file.

As you said before: semantics gets pushed down the pipe.
That is right: it gets pushed over the wire, to the remote
side, and interpreted there.  HOW you specify your semantics
in an eRead string is up to the service definition and your
use case.

app. -> eread -> wire -> service -> HDF5 -> localVFD -> file

> This is why I fail to see the benefits of the eRead interface (it 
> didn't prevent us from mucking with the guts of HDF5 if you want to 
> preserve the HDF5 API, but it also didn't reduce complexity for the 
> user if you are going to replace the HDF5 APIs with these stringy 
> pattern requests).

Nop, its not supposed to replace HDF5.  Its also not
supposed to replace libjpeg, libtiff, ... - you name it.  It
does not solve world problems.

All it does is: it provides the ability to have application
specific semantics pushed to the remote side, where it can
be efficiently interpreted.  The other solutions don't
provide that.

If you need the HDF5 API, you use the HDF5 api, not SAGA.

> >start, stop, stride, reps corespond directly to the HDF5
> >semantics.  So, the semantic info is indeed maintained on
> >appliation level, and, as you said before, its
> >interpretation is pushed to lower levels.
> 
> It looks like you will end up encoding the entire HDF5 API as eRead 
> pattern strings and push it to the other end of a client-server 
> connection.  Again, I'm not sure if we made life easer for the remote 
> HDF5 people.
> 
> >How would that look for recv?
> 
> What I was thinking is that developers of HDF5 may have an interest in 
> defining vector or patterned read operations at the VFD layer of their 
> interface.  This would enable them to propagate the kind of information 
> you are attempting to encode in eRead strings down to the driver where 
> vector-read interfaces can take advantage of them for deeper pipelining 
> of high-latency operations.  (they could, for instance, use some of the 
> methods that Thorsten was referring to, or they could use vread/vwrite 
> type operations).
> 
> So the issue is that
> 	1) if you use eRead to replace the HDF5 API, then we are talking 
> 	about an enormously complex string-encoding interface.
> 	2) if you use eRead in the VFD, then you have to instrument HDF5 to 
> propagate information about patterned reads down the driver layer.  
> That is of course the same thing you need if you use vread()/readp() 
> (or any of the interfaces that Thorston described).  So I don't see 
> much of a difference in capability there except that vread/readp 
> already has information in a form that you can do I/O with.  With 
> eRead, you still have to go through and parse some strings to gain 
> access to the same information about the pattern of reads/writes?

I did not assume that SAGA would be the right thing to use
to implement a HDF5-VFD.  That is not exactly the
application community SAGA is targeting at I think.

application -> HDF5 -> sagaVFD -> saga -> gridftp (or so) -> file

But I see now, and agree: on VFD level, eRead does not buy
you much if compared to pitfalls (I'm still unsure about
vread, but that won't help this discussion ;-)

> So its not merely that eRead is pushing complexity to a different 
> layer... I don't see where it is reducing complexity.

Maybe we should go away from HDF5.  Assume a application
specific binary file.  You want subsampling.  Locally you
do (seeked before):

  for ( int x = 0;  x < X_MAX / 2; x++ )
  {
    for ( int y = 0;  y < Y_MAX / 2; x++ )
    {
      for ( int z = 0;  z < Z_MAX / 2; x++ )
      {
        data[x][y][z] = my_file_read (x*2, y*2, z*2);
      }
    }
  }

In SAGA now, that is the same: it would call read and seek
so and so often.  

SAGA with readv would allow you to do:

  for ( int x = 0;  x < X_MAX / 2; x++ )
  {
    for ( int y = 0;  y < Y_MAX / 2; x++ )
    {
      for ( int z = 0;  z < Z_MAX / 2; x++ )
      {
        iovecs[n].iov_base = ...
        iovecs[n].iov_len  = 1;
        n++;
      }
    }
  }

  file.readv (iovecs, data, n);

SAGA with eread would allow you to do:

  snprintf   (request, 255, "downsample %d %d %d %d", offset, 2, 2, 2);
  file.eRead (request, data, n);

Shorter, but it requires a infrastructure which understands
the request (well, for readv you need also a remote
counterpart, but that can be agnostic to semantics...).

readv is more posix like, and more generic.  It always works
if you are on read level (e.g. HDF5 VFD layer ;-).

eread is more powerful: it allows applicatoin specific
optimization which is not achievable with readv (the size of
the iovecs in the read request is double of the size of the
data returned!).

Cheers, Andre.

-- 
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky at cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+