[drmaa-wg] reason for job abort : Tracker 309

Andreas Haas Andreas.Haas at Sun.COM
Tue Feb 8 11:18:09 CST 2005


On Tue, 8 Feb 2005, Roger Brobst wrote:

>
> As described in
>    http://forge.gridforum.org/tracker/?aid=309
> it would be useful for drmaa to provide a mechanism
> that an application can use to determine why a job
> was aborted.
>
> The existing proposal is to add a string output argument
> to drmaa_wait() that provides text information in
> case drmaa_wifaborted()==true
>
> My counter-proposal is to introduce a new function
> that can be called after drmaa_wifaborted()==true
> to obtain the reason the job was aborted.
> (Similar to how drmaa_wexitstatus() can be called
>  after dramm_wifexited()==true to obtain the
>  actual exit status)
>
> int drmaa_wabortreason(
>     char* abort_reason,         /*OUT*/
>     size_t abort_reason_len,    /*IN*/
>     int stat,                   /*IN*/
>     char* error_diagnosis,	/*OUT*/
>     size_t error_diag_len       /*IN*/
>     )
>
> We should discuss what the function should output
> if the drmaa implementation does not have any
> information about why the job was aborted.

I agree this would fix the #309 problem. I doubt however
it is possible to extract 'char* abort_reason' from
a 'int stat'...

I think we need to introduce a new opaque datatype that is
to be returned by drmaa_wait(). The new datatype would have
to serve as a wrapper for stat, rusage and abort_reason and
drmaa_wif* functions would simply operate on the new data
type instead of 'stat'.

Regards,
Andreas





More information about the drmaa-wg mailing list