[DRMAA-WG] Some thoughts on drmaa_wifaborted vs. DRMAA spec
Peter Tröger
peter at troeger.eu
Mon Aug 11 02:51:32 CDT 2008
I checked the IDL recommendation, which is the latest revision of DRMAA
semantics. The argumentation is basically the same. A job can only be
"aborted" if it was not running so far. During the "running" state, a
job can stop to work intentionally (leading to wif_exited=true), without
a known reason (wif_exited=false), or due to a signal
(wif_signalled=true). wif_aborted should be "false" in all these cases.
I did a quick check of the Condor DRMAA sources. This implementation
returns "wif_aborted=true" only if the job was rejected during
submission. It therefore seems to work as expected.
Regards,
Peter.
Piotr Domagalski wrote:
> Hi Roger,
>
> On Thu, Aug 7, 2008 at 3:51 PM, Roger Brobst <rogerb at cadence.com> wrote:
>
>> Without commenting on any specific implementation,
>> I feel comfortable that the section of the DRMAA spec
>> pertaining to drmaa_wif{exited,signalled,aborted}
>> is written as intended.
>>
>> In particular, once a process is started, it should
>> eventually end by exiting or by being signalled.
>> If the former case, the exit value should be accessible.
>> In the latter case, the signal should be accessible.
>>
>> Once a process has been started by the DRM (and enters
>> the 'running state') wifaborted should never be true
>> for the job.
>>
>
> OK. I totally agree -- that does make sense (besides the fact that
> there's no equivalent of wifaborted in POSIX). I'm curious about
> other's opinions, especially Andreas' as he will probably know the
> details of SGE implementation.
>
> When we agree on the consensus, I'll have to look into our LSF and PBS
> implementations to have this checked/fixed. As far as I remember,
> they're behaving like SGE, i.e. aborted = true for all terminated jobs
> whatever their state was.
>
> P.S. I though it'd be appropriate to move this discussion back to
> drmaa-wg at ogf.org
>
>
More information about the drmaa-wg
mailing list