[DRMAA-WG] Conference call -Feb 3rd - 17:00 UTC
Daniel Templeton
Dan.Templeton at Sun.COM
Mon Feb 2 16:13:32 CST 2009
Since I won't make the meeting, here's my feedback.
Peter Tröger wrote:
> 2. Voting about "UNDETERMINED" job state
> - keep it as own job state ?
>
Yes. Undetermined stays as a state, but it is redefined to mean
permanently undetermined. Trying again later will yield the same result.
> - Means permanent or temporary problem ?
>
To represent the temporarily undetermined state, we expand the
TryAgainLaterException to apply to drmaa_job_ps() as well.
> 3. Voting about separate "TERMINATED" vs. "FAILED" state
> - Semantics
>
A job that exits via the terminated state has the potential to succeed
if resubmitted. It entered the terminated state due to an action taken
by the job owner, an administrator, or the DRM system itself, possibly
on behalf of the terminated job. A job that exits via the failed state
is unlikely to succeed if resubmitted. It entered the failed state due
to an error in the job or a misconfiguration of the machine on which it ran.
There is a problem with my clean could-succeed/won't-succeed division.
What if a job failed because the machine it ran on was wonky? That is
clearly a failure, not a termination, but if the job were resubmitted
and landed on any other machine, it would succeed. In that case, do we
actually care if there was a difference between failure and termination?
> - Resulting new job state transitions
>
There's one more thing we may want to consider. In SGE, a job can exit
one of four ways. It can succeed. It can fail, which includes
termination. It can request to be rescheduled. And it can be set into
error state. The first two are handled fine by drmaa_wait(). The third
can be recognized by drmaa_job_ps(), but it's not ideal. The fourth is
completely unknowable from DRMAA. To the DRMAA client, it will look
like the job was requeued to be rescheduled, but is never actually
scheduled to run again. We might want to consider supporting some
additional states, such as rescheduled or error, or maybe those states
are something that the state/substate model would enable.
I vote for making the substate as generic as possible. I think forcing
it to be an integer in unnecessarily limiting. Taking some Java APIs as
examples, sometimes the substates are really just text messages that
explain what's going on. I think that's valid and something we should
allow.
> 4. Further DRMAA2 discussion
>
See the attached email from a few weeks ago.
Daniel
-------------- next part --------------
An embedded message was scrubbed...
From: Daniel Templeton <Dan.Templeton at Sun.COM>
Subject: DRMAA v2
Date: Tue, 20 Jan 2009 08:46:24 -0800
Size: 1905
Url: http://www.ogf.org/pipermail/drmaa-wg/attachments/20090202/4cb60fc6/attachment.mht
More information about the drmaa-wg
mailing list