[DRMAA-WG] DRMAA2: TERMINATED vs. FAILED state
Peter Tröger
peter at troeger.eu
Mon Mar 2 03:37:25 CST 2009
Dear all,
this discussion thread is intended to finalize the discussion about job
states after execution end in DRMAA2.
In DRMAA1, there is only the FAILED state, expressing that the job was
running but did not finish successfully for some reason. Piotr proposed
a separation between FAILED and TERMINATED jobs:
http://www.ogf.org/pipermail/drmaa-wg/2009-January/000985.html
We meanwhile had different proposals regarding this idea:
Option 1)
TERMINATED state = resubmission might help,
FAILED state = resubmission unlikely to help (machine problem,
misconfiguration)
Option 2)
TERMINATED state = triggered by an external entity,
FAILED state = job terminated by itself
Option 3)
FAILED state = job command line could not be executed
TERMINATED state = something else happened
Option 4)
Stick with FAILED only, and express special circumstances via the new
job sub-state information
Issue #5875 (originally form the PBS experience report) criticizes that
FAILED currently expresses both user-requested termination and job
failure. How is this issue related to the problem ?
Another question is the relation to the wif_* functions.
Please contribute with you opinion.
Thanks,
Peter.
More information about the drmaa-wg
mailing list