[SAGA-RG] SAGA Job destruction

Andre Merzky andre at merzky.net
Thu Mar 5 02:26:28 CST 2009


Dear Malcolm, 

Quoting [Malcolm Illingworth] (Mar 04 2009):
> 
> Hi Andre,
> 
> I'm reviewing our SAGA implementation now that we are starting to approach
> the end of the project, and realise that there a couple of things about the
> job submission that I havent understood despite all this time. 
> 
> I cant seem to find a specific method defined on Job or JobService to
> destroy a job on a remote execution site, and destroy its resources. In our
> implementation we got round this by adding a non-standard cleanup method,
> which will try to gracefully stage out any output files from a job then
> release the resources.

saga::job::job inherits from saga::task, which has a
cancel() method - that is what you are looking for, I think.


> Similarly I'd like to do a more forceful version of the above, which does
> not attempt to stage out any output files and simply destroys the job (eg in
> the case of a failed or cancelled job).

There is only that one way (cancel()) to destroy tour job
forcefull.  The semantics of it is:

  - for resource deallocation semantics, see 
    Section 2. 
  - if cancel() fails to cancel the task 
    immediately, and tries to continue to cancel 
    the task in the background, the task state 
    remains ’Running’ until the cancel operation 
    succeeded. The state then changes to 
    ’Canceled’. 
  - if the task is in a final state, the call has 
    no effect, and, in particular, does NOT change 
    the state from ’Done’ to ’Canceled’, or from 
    ’Failed’ to ’Canceled’. This is to 
    avoid race conditions. 
  - if the task is in ’New’ state, an 
    ’IncorrectState’ exception is thrown. 
  - a ’NoSuccess’ exception indicates 
    that the backend was not able to initiate the 
    cancelation for the task. 
  - for timeout semantics, see Section 2. 

So, if your implementation is simply killing the job, or if
it is attempting to stage out whatever the job produced by
then, is up to you.  

Well, actually, it is not the only way, really:  
job.signal (KILL) or job.signal (TERM) would do the trick,
too, but most likely w/o any postprocessing, as to the job
manager it will look like the job simply failed.  No idea if
your applications could handle job.signal (USR) sensibly.


> Can you see a SAGA-compliant way to define these operations?

So, I'd say you could implement job.cancel () to include
stage-out, and job.signal (KILL) to not to include stage
out.  Would that make sense?


> Unfortunately our travel budget does not extend to Catania :(

:-(  

> Thanks and best regards,
> Malcolm. 

Cheers, Andre.



More information about the saga-rg mailing list