[DRMAA-WG] DRMAA2: TERMINATED vs. FAILED state

Peter Tröger peter at troeger.eu
Mon Mar 2 03:37:25 CST 2009


Dear all,

this discussion thread is intended to finalize the discussion about job 
states after execution end in DRMAA2.
In DRMAA1, there is only the FAILED state, expressing that the job was 
running but did not finish successfully for some reason. Piotr proposed 
a separation between FAILED and TERMINATED jobs:

http://www.ogf.org/pipermail/drmaa-wg/2009-January/000985.html

We meanwhile had different proposals regarding this idea:

Option 1)
TERMINATED state = resubmission might help,
FAILED state = resubmission unlikely to help (machine problem, 
misconfiguration)

Option 2)
TERMINATED state = triggered by an external entity,
FAILED state = job terminated by itself

Option 3)
FAILED state = job command line could not be executed
TERMINATED state = something else happened

Option 4)
Stick with FAILED only, and express special circumstances via the new 
job sub-state information

Issue #5875 (originally form the PBS experience report) criticizes that 
FAILED currently expresses both user-requested termination and job 
failure. How is this issue related to the problem ?

Another question is the relation to the wif_* functions.

Please contribute with you opinion.

Thanks,
Peter.


More information about the drmaa-wg mailing list