[saga-rg] Job States
Andre Merzky
andre at merzky.net
Thu Aug 4 02:03:48 CDT 2005
Quoting [Christopher Smith] (Aug 04 2005):
>
> On 29/7/05 10:40, "Andre Merzky" <andre at merzky.net> wrote:
>
> > SAGA Jobs have currently following states:
> >
> [chop]
> >
> > I got the comment from colleques that PreStaging and
> > PostStaging are missing. Indeed these stages seem not to
> > fir into any of the above ones. Running would be a
> > candidate, but since the remote resource is not neccessarily
> > used anymore, that might be confusing. Should these stages
> > be added? However, they do also not appear in the DRMAA
> > specification AFAIK.
> >
> > Any thoughts?
> >
> These states can be added.
>
> Also, there is a more complicated state model for "activities" emerging from
> the OGSA-BES work, that also includes sub-states for file staging, etc, etc.
> We can perhaps incorporate some of that as well, although I'm happy with
> general Pre-execution and Post-execution states to cover all of this.
Pre-Execution/Post-Execution sounds good to me. I guess we
don't want to have a too complex state model, and these two
can incorporate whatever SAGA or the backend seems necessary
to do before/after the job is actually running...
> Perhaps we can discuss on the call tomorrow.
Great.
> > Another question: Assume I check a job status and find it
> > 'DoneFail' - how can I determine the reason of failure? It
> > would be useful to know the status the job was in before it
> > failed (e.g. if it was prestating, I know then that staging
> > failed, and the job never really started). Also it would be
> > nice to be able to query for any error message.
> >
>
> There is the getJobExitStatus method on the Job interface so that you can
> get things like the exit code and the signal number that caused termination.
>
> As for querying the state which preceded the failure, it sounds like a good
> idea (LSF does this by keeping a history log for jobs that can be queried
> via a "bhist" command). Perhaps adding an optional string to the
> JobExitStatus class would be sufficient for this kind of extended
> information? The problem is that this stuff is not particularly standardized
> across resource managers I think.
I think a (potentially) extensive error message on the exit
status object is the simpliest solution - if job failed,
look there to find some infos about the reason, if
available. Nice.
Cheers, Andre.
> > I think that the error query is distinct from the exception
> > mechanism we will have: a job entering DoneFail should NOT
> > throw an exception in my opinion - but that leads to above
> > question: how can I query the error leading ot the DoneFail
> > state?
>
> I agree.
>
> -- Chris
--
+-----------------------------------------------------------------+
| Andre Merzky | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science | mail: merzky at cs.vu.nl |
| De Boelelaan 1083a | www: http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands | |
+-----------------------------------------------------------------+
More information about the saga-rg
mailing list