[saga-rg] Re: SAGA question (fwd)

Andre Merzky andre at merzky.net
Wed Jul 6 23:51:07 CDT 2005


Dear Mark, 

Thanks fro reading the API spec! :-)

My 2cent worth of comments to your questions below.


Quoting [Shantenu Jha] (Jul 07 2005):
> 
> ---------- Forwarded message ----------
> Date: Fri, 10 Jun 2005 14:12:40 +0100 (BST)
> From: Mark McKeown <zzalsmm3 at nessie.mcc.ac.uk>
> Subject: SAGA question
> 
> I have a quick SAGA question about the strawman
> API  - page 41:
> 
> .....
>     JobService myjs  = SomeJobServiceFactory (...);
>     Job        myjob = new Job ();
> 
>     myjs.submitJob (jobdef, myjob);
> 
>     while ( something )
>     {
>       JobState myjobstate;
>       myjob.getJobState (myjobstate);
> 
>       if ( myjobstate == Running )
> etc...
> ......
> 
> 
> The two issues I am concerned with are latency and partial
> failure (see "A Note on Distributed Computing",
> http://research.sun.com/techrep/1994/abstract-29.html).
> 
> ->Latency
> 
> Since the job is running remotely is it sensible for a client
> to make a decision based on if it is running - by the time
> the client has got the status message the state of the job may
> have changed.

You have the same race condition locally - just because the
times are shorter they are less likely to occur, but they
can: you do a ps, and then a kill - and voila, the job is
already gone.

Since the kill (or whatever you do) will deliver a good
error description ("Job does not exist", "Job is already
stopped" etc), your application should be able to handle
that situation gracefully (just as in the local case, where
errno or shell return value give similar infos).


> ->Partial Failure
> What happens if there is a network failure when I do
> 
>   myjob.getJobState (myjobstate);
> 
> how will this error by handled by the API?

You will get a different, explicit error message ("could not
contact resource/resource manager" or so) with the
description of the problem, not just a failure.  

However, we need to define the possible error codes more
specifically in the specification, you certainly have a
valid point.  We are working on that (Tom today volonteered
to have a closer look at the error system).

Thanks for your feedback, 

  Cheers, Andre.


> cheers
> MArk


-- 
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky at cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+





More information about the saga-rg mailing list