[SAGA-RG] SAGA Job destruction

Thu Mar 5 09:26:52 CST 2009

Ah, I think I understand now - thanks for the clarification!
So, your job finishes the normal way, but data (and meta
data) linger on, and you wonder about how to control that.

Indeed, SAGA does not allow you to do that.  The Cleanup
attribute for the job description won't help you much I
assume, as it will only be able to flag fro immediate
cleanup (or not), but will not give you the ability to
triger it.

I don't have a good answer for you at the moment.  Yes, one
option would be to add some method to the job service - but
I think this is an feature request rather than an errata at
that point, so I'd rather defer that to the next version of
SAGA, from the specification point of view.  From the
implementation PoV, I'd say you should do that, but be
prepared to move that elsewhere if the spec process comes up
with something different.  

Other options I could think of, spontanously, would be:

  - add a lifetime value to the cleanup attribute
  - put that feature into a future resource management package
  - add cleanup() as a method to saga::job
  - as you suggested: add a cleanup() to the job::service

There are probably others.

On that occasion,  I'll open a new tracker for SAGA feature
requests and other items for 2.0.

Thanks, Andre.

Quoting [Malcolm Illingworth] (Mar 05 2009):
> 
> Hi Andre,
> 
> Thanks for the reply. I think my use of the word "destroy" had some unfortunate ambiguities! I was explicitly thinking of how to free resources for a completed/failed job rather than stopping an executing job. I'll try and explain my problem a bit more clearly.
> 
> When we submit a job, the job can end up in a number of final states, ie Canceled, Failed, Done. Currently we don???t automatically delete the output files from a job, and allow the user to parse the working directory of the job for the output files during and after job execution. This means we effectively don???t support the "Cleanup" job attribute, as we don???t allow jobs to automatically cleanup. (We do of course also support staging out of files once a job has completed, as per the SAGA spec). This means that the job remains in the list of jobs in the resource manager, and output files remain until explicitly deleted by the user.
> 
> The definition of cancel dosent seem appropriate in this case. Instead, we want the user to be able to delete the job so that it no longer appears in a list of jobs, and the job's working directory and output files are destroyed. Its this operation that I cant seem to find a good match for in the SAGA API. For example something like JobService.removeJob(String jobID)
> 
> We could try supporting the Cleanup attribute so that the working directory is automatically destroyed on completion, but we would still need a method to explicitly remove the job reference from the remote job manager.
> 
> Cheers,
> Malcolm. 
> 
>  
> 
> -----Original Message-----
> From: Andre Merzky [mailto:andremerzky at gmail.com] On Behalf Of Andre Merzky
> Sent: 05 March 2009 08:26
> To: Malcolm Illingworth
> Cc: 'Andre Merzky'; SAGA RG
> Subject: Re: SAGA Job destruction
> 
> Dear Malcolm, 
> 
> Quoting [Malcolm Illingworth] (Mar 04 2009):
> > 
> > Hi Andre,
> > 
> > I'm reviewing our SAGA implementation now that we are starting to 
> > approach the end of the project, and realise that there a couple of 
> > things about the job submission that I havent understood despite all this time.
> > 
> > I cant seem to find a specific method defined on Job or JobService to 
> > destroy a job on a remote execution site, and destroy its resources. 
> > In our implementation we got round this by adding a non-standard 
> > cleanup method, which will try to gracefully stage out any output 
> > files from a job then release the resources.
> 
> saga::job::job inherits from saga::task, which has a
> cancel() method - that is what you are looking for, I think.
> 
> 
> > Similarly I'd like to do a more forceful version of the above, which 
> > does not attempt to stage out any output files and simply destroys the 
> > job (eg in the case of a failed or cancelled job).
> 
> There is only that one way (cancel()) to destroy tour job forcefull.  The semantics of it is:
> 
>   - for resource deallocation semantics, see 
>     Section 2. 
>   - if cancel() fails to cancel the task 
>     immediately, and tries to continue to cancel 
>     the task in the background, the task state 
>     remains â??????Runningâ?????? until the cancel operation 
>     succeeded. The state then changes to 
>     â??????Canceledâ??????. 
>   - if the task is in a final state, the call has 
>     no effect, and, in particular, does NOT change 
>     the state from â??????Doneâ?????? to â??????Canceledâ??????, or from 
>     â??????Failedâ?????? to â??????Canceledâ??????. This is to 
>     avoid race conditions. 
>   - if the task is in â??????Newâ?????? state, an 
>     â??????IncorrectStateâ?????? exception is thrown. 
>   - a â??????NoSuccessâ?????? exception indicates 
>     that the backend was not able to initiate the 
>     cancelation for the task. 
>   - for timeout semantics, see Section 2. 
> 
> So, if your implementation is simply killing the job, or if it is attempting to stage out whatever the job produced by then, is up to you.  
> 
> Well, actually, it is not the only way, really:  
> job.signal (KILL) or job.signal (TERM) would do the trick, too, but most likely w/o any postprocessing, as to the job manager it will look like the job simply failed.  No idea if your applications could handle job.signal (USR) sensibly.
> 
> 
> > Can you see a SAGA-compliant way to define these operations?
> 
> So, I'd say you could implement job.cancel () to include stage-out, and job.signal (KILL) to not to include stage out.  Would that make sense?
> 
> 
> > Unfortunately our travel budget does not extend to Catania :(
> 
> :-(  
> 
> > Thanks and best regards,
> > Malcolm. 
> 
> Cheers, Andre.
>