[saga-rg] proposal for extended file IO

Mon Jun 13 15:28:36 CDT 2005

Quoting [Jon MacLaren] (Jun 13 2005):
> 
> It's not a question about functionality.  More a comment about  
> language design, and semantics.  You are potentially hiding a large  
> amount of processing behind a file read.  

Ok, I see - that is true.  And its intentional.  If you do a
job submit, you are hiding a lot of stuff as well - even
more: information service gets queried for resources, a
broker does intelligent (ahem) decisions, files get staged,
job gets queued, runs, gets migrated, dies, files get staged
back.  All you see on API level is a (addmitedly complex)
submit, and a simple job status.

> I don't find that  
> intuitive.  Should I put code around all eReads to allow for this?
> 
> With the explicit prepare, I might send a message to a service to so  
> the prepare, then start/queue a batch job once the processing was  
> complete.  If I am sitting on a file read for an hour on a  
> supercomputer, it's expensive.  That's why I think the decoupling is  
> better.
> 
> But I suppose that I could implement the decoupled prepare/read  
> outside of the SAGA API, which is maybe where it belongs.  

Dunno really ;-)

Well, the design contraints of SAGA are:

  - simple, simple, simple.  Make simple things easy, make
    difficult things possible, leave out the rest.

  - only put into saga what comes up in GOOD use cases

> And the API you have is certainly fine for smaller files.

Some of our use cases include large data access, for example
the remote viz ones.  So, small files is not  good enough :-(

Cheers, Andre.

> Perhaps that is what you are suggesting at the end of your reply....
> 
> ><snip>
> >If the first preperation takes an hour...?
> >
> >The again, middleware like data cutter can benefit from
> >preprocessed data (do indexing before, or create octree
> >structure before) - that could be done by creating a task
> >beforehand, which prepares the data, and then do the read
> >afterwards.  Would that do what you need?
> >
> >  // warning: Pseudo Pseudo Code...
> >  Job  job  ("host_A", "/bin/subsample /data/hige_file_A /tmp/ 
> >small_file_B");
> >
> >  // wait for job completion
> >  // read prepared data
> >  File file ("gridftp://host_A//tmp/small_file_B");
> >  file.read (100, buffer, &out);
> 
> I guess we are agreeing...
> 
> Jon.

-- 
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky at cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+