[saga-rg] proposal for extended file IO

John Shalf jshalf at lbl.gov
Tue Jun 14 13:42:34 CDT 2005


On Jun 14, 2005, at 10:12 AM, Andrei Hutanu wrote:
> I support this model in the more generic way I proposed ..
> It's a more accurate model because you explicitely
> submit a job when you are doing (potentially) complex operations
> and it allows for better performance.
>
> I think John is right when saying that eread is tricky and hard to use,
> I also agree with Andre that a simpler model is even more limiting 
> than eread.

I agree with that statement completely.  I think pread/readv is a bit 
*too* simple.  However, we should look at some more elemental 
interfaces for describing patterned read/write operations.  I'm quite 
interested in following up on some of the leads that Thorsten gave us 
for instance.

> Any other opinions?
>
> Andrei
>
>> E (I already wrote that on the gat-devel-list):
>>
>> You could submit a process to your archive which extracts the data 
>> for you and registers the result as a new logical file. On the client 
>> side you could wrap it in a nice library hiding the job submission 
>> stuff and on the server/archive side you would prepare some 
>> executables for your tasks:
>> extract_hyperslab_from_hdf5 <logical_file_name> <hyperslab> 
>> <new_logical_file_name>
>> compress_file <logical_file_name> <compression_method> 
>> <new_logical_file_name>
>> ...
>>
>> It shouldn't be that hard to prevent users from executing other 
>> executables on the server.
>>
>> This method is async and you can use the job interface to check for 
>> the status of your conversion job.
>>
>>
>>
>>
>>> I think there is a 4th possibility.  If each of the I/O operations 
>>> can
>>> be requested asynchronously, then you can get the same net effect as
>>> the ERET/ESTO functionality of the GridFTP.  The advantage of simply
>>> embedding that functionality into the higher-level concept of
>>> asynchronous calls is that if the underlying library does *not* 
>>> support
>>> the async operations (or some subset of the operations cannot be
>>> performed asynchronously) , you can always perform the operations
>>> synchronously and still be able present
>>>
>>> I do not like plan A or B for the reasons you state.  I do not like
>>> Plan C because it is too tightly tied to a specific data transfer
>>> system implementation. I would propose a Plan D that simply augments
>>> the Task interface of SAGA.  For example, if you allowed the user to
>>> fire off a number of async read operations
>>> 	Task handle1= channel.read();
>>> 	Task handle2=channel.read();
>>> 	container.addTask(handle1);
>>> 	container.addTask(handle2);
>>> 	container.waitAll();
>>>
>>> The read operations in this example can be submitted as an eRead
>>> operation or they can be in separate threads, or they can simply be
>>> executed synchronously when you call waitAll()  (this is in fact how
>>> some of the async MPI I/O was done on the first SGI origin 
>>> machines...
>>> it was meant to look asynchronous, but in fact the calls did not
>>> initiate until you did a "Wait" for them).
>>>
>>> Anyways, using the task interface provide more degrees of freedom for
>>> implementing async I/O than simply supporting the GridFTP way of 
>>> doing
>>> things and it meshes gracefully with I/O implementations that do 
>>> *not*
>>> offer an underlying async execution model.
>>>
>>> The only modification that would be useful to add to the tasking
>>> interface is a notion of "readFrom()" and "writeTo()" which allows 
>>> you
>>> to specify the file offset together with the read.  Otherwise, the
>>> statefulness of the read() call would make the entire "task" 
>>> interface
>>> useless with respect to file I/O.
>>>
>>> -john
>>>
>>> On Jun 12, 2005, at 11:02 AM, Andre Merzky wrote:
>>>
>>>> Hi again,
>>>>
>>>> consider following use case for remote IO.  Given a large
>>>> binary 2D field on a remote host, the client wans to access
>>>> a 2D sub portion of that field.  Dependend on the remote
>>>> file layout, that requires usually more than one read
>>>> operation, since the standard read (offset, length) is
>>>> agnostic to the 2D layout.
>>>>
>>>> For more complex operations (subsampling, get a piece of a
>>>> jpg file), the number of remote operations grow very fast.
>>>> Latency then stringly discourages that type of remote IO.
>>>>
>>>> For that reason, I think that the remote file IO as
>>>> specified by SAGA's Strawman as is will only be usable for a
>>>> limited and trivial set of remote I/O use cases.
>>>>
>>>> There are three (basic) approaches:
>>>>
>>>>  A) get the whole thing, and do ops locally
>>>>     Pro: - one remote op,
>>>>          - simple logic
>>>>          - remote side doesn't need to know about file
>>>>            structure
>>>>          - easily implementable on application level
>>>>     Con: - getting the header info of a 1GB data file comes
>>>>            with, well, some overhead ;-)
>>>>
>>>>  B) clustering of calls: do many reads, but send them as a
>>>>     single request.
>>>>     Pro: - transparent to application
>>>>          - efficient
>>>>     Con: - need to know about dependencies of reads
>>>>            (a header read needed to determine size of
>>>>            field), or included explicite 'flushes'
>>>>          - need a protocol to support that
>>>>          - the remote side needs to support that
>>>>
>>>>  C) data specific remote ops: send a high level command,
>>>>     and get exactly what you want.
>>>>     Pro: - most efficient
>>>>     Con: - need a protocol to support that
>>>>          - the remote side needs to support that _specific_
>>>>            command
>>>>
>>>> The last approach (C) is what I have best experiences with.
>>>> Also, that is what GridFTP as a common file access protocol
>>>> supports via ERET/ESTO operations.
>>>>
>>>> I want to propose to include a C-like extension to the File
>>>> API of the strawman, which basically maps well to GridFTP,
>>>> but should also map to other implementations of C.
>>>>
>>>> That extension would look like:
>>>>
>>>>      void lsEModes   (out array<string,1> emodes   );
>>>>      void eWrite      (in  string          emode,
>>>>                        in  string          spec,
>>>>                        in  string          buffer
>>>>                        out long            len_out  );
>>>>      void eRead       (in  string          emode,
>>>>                        in  string          spec,
>>>>                        out string          buffer,
>>>>                        out long            len_out  );
>>>>
>>>>      - hooks for gridftp-like opaque ERET/ESTO features
>>>>      - spec:  string for pattern as in GridFTP's ESTO/ERET
>>>>      - emode: string for ident.  as in GridFTP's ESTO/ERET
>>>>
>>>> EMode:        a specific remote I/O command supported
>>>> lsEModes:     list the EModes available in this implementation
>>>> eRead/eWrite: read/write data according to the emode spec
>>>>
>>>> Example (in perl for brevity):
>>>>
>>>>  my $file   = SAGA::File new
>>>> ("http://www.google.com/intl/en/images/logo.gif");
>>>>  my @emodes = $file->lsEModes ();
>>>>
>>>>  if ( grep (/^jpeg_block$/, @emodes) )
>>>>  {
>>>>    my ($buff, $len) = file.eRead ("jpeg_block", "22x4+7+8");
>>>>  }
>>>>
>>>> I would discourage support for B, since I do not know any
>>>> protocoll supporting that approach efficiently, and also it
>>>> needs approximately the same infrastructure setup as C.
>>>>
>>>> As A is easily implementable on application level, or within
>>>> any SAGA implementation, there is no need for support on API
>>>> level -- however, A is insufficient for all but some trivial
>>>> cases.
>>>>
>>>> Comments welcome :-))
>>>>
>>>> Cheers, Andre.
>>>>
>>>>
>>>> --
>>>> +-----------------------------------------------------------------+
>>>>
>>>> | Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
>>>> | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
>>>> | Dept. of Computer Science         | mail: merzky at cs.vu.nl       |
>>>> | De Boelelaan 1083a                | www:  http://www.merzky.net |
>>>> | 1081 HV Amsterdam, Netherlands    |                             |
>>>>
>>>> +-----------------------------------------------------------------+
>>>>
>>
>>
>





More information about the saga-rg mailing list