[DRMAA-WG] wifexited and wifsignalled confusion continues

Wed Nov 5 16:49:05 CST 2008

Hi Roger,

On Wed, Nov 5, 2008 at 10:41 PM, Roger Brobst <rogerb at cadence.com> wrote:
> However if WIFSIGNALED output zero (false),
>    calling WTERMSIG is undefined.
> [...]
> However, If WIFEXITED output zero (false),
>    calling WEXITSTATUS is undefined.

Yes, I totally agree with that. I just oversimplified this a bit in my listings.

> Yes, if a DRM uses a shell to start the client-specified program
> and the shell uses the convention of conveying that the child
> was terminated by exiting with 128+sigNum, then the DRM may not
> be able to distinguish between a child exit(137) and being
> terminated by sigNum=9.
> This is a DRM implementation issue.

I was actually hoping for some discussion as to how should DRMAA
implementation should look like in this case. And also (that's mainly
for Peter), what should the test suite look like. For example, now it
tests exit statuses 0..255 which would obviously fail if we wanted to
assume that drmaa_wifexited is true only for 0..128 and use the
remaining values for signal numbers.

> Yes, since a given process cannot exit itself and be terminated,
> WIFEXITED and WIFSIGNALED should never both be true (non-zero).

Are you talking about DRMAA's job here or just a general unix process?
Because in the former case, there seems to be a differenet assumption,
probably because of the "failed after running" case in wifexited.

The thing is that current test suite (again, Peter?) tests whether a
signalled DRMAA's job was both wifsignaled and wifexited. That kind of
puzzled me.

> Historically, a zero exit status from a Unix process meant
> "exited successfully".
> I believe the "failed after running" clause in the below
> excerpt is intended to mean exited with a non-zero value
>
>> "Evaluates into 'exited' a non-zero value if stat was returned
>>  for a job that either failed after running or finished after
>>  running"

The problem is that, as far as I understood Peter's intentions in the
test suite, this "failed after running" clause is interpreted
differently.

-- 
Piotr Domagalski