[drmaa-wg] drmaa_wait() Clarification

Mon Jan 24 09:35:58 CST 2005

Still no change in the answer...  We could re-visit the issue tomorrow
at the DRMAA call.

The spec says that data may be reaped only once:

A DRMAA implementation SHALL collect remote run usage data (rusage
variable) after the
remote job run and job finish information (stat variable). The user MAY
reap this data only once.
The implementation is free to "garbage collect" the reaped data at a
convenient time. Only the
data from the current session's job Id MUST be available. Reaping data
from other session job
Id's MAY be supported in a DRMAA implementation.
----------------------------

    -Hrabri

-----Original Message-----
From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf
Of Daniel Templeton
Sent: Monday, January 24, 2005 9:19 AM
Cc: DRMAA Working Group
Subject: Re: [drmaa-wg] drmaa_wait() Clarification

I'm not talking about skittish grey areas.  I mean:

1) Thread 1 does drmaa_wait ("1", -1)
2) Thread 2 does drmaa_wait ("1", -1)
3) Job 1 ends

Who gets what in that case?  The spec does not address concurrent 
access.  It says that if you swap steps 2 and 3, then the second thread 
gets an error.

Daniel

Rajic, Hrabri wrote:

> The spec says there is no reaping data for the late comer.
> 
> A quality implementation could try to provide both threads with
> everything if the second request comes during the first request
> processing.  This is a grey area and our policy so far has been not to
> have things over-specified.
> 
>     -Hrabri
>  
> 
> -----Original Message-----
> From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf
> Of Daniel Templeton
> Sent: Monday, January 24, 2005 8:35 AM
> Cc: DRMAA Working Group
> Subject: Re: [drmaa-wg] drmaa_wait() Clarification
> 
> Andreas Haas wrote:
> 
>>On Mon, 24 Jan 2005, Daniel Templeton wrote:
>>
>>
>>
>>>How is an implementation supposed to handle the case where two
threads
>>>call drmaa_wait() on the same job id?  The choices are:
>>>
>>>a) Both get notified when the job ends and both gets copies of the
job
>>>exit and resource usage information
>>>b) Both get notified when the job ends.  One gets the job exit and
>>>resource information and the other gets a DRMAA_ERRNO_NO_RUSAGE.
>>>c) Both get notified when the job ends.  Which gets a copy of the job
>>>exit and resource information and which gets a DRMAA_ERRNO_NO_RUSAGE
>>>depends on which thread runs when.
>>>d) That's not allowed
>>>
>>>b and c are race conditions and there's no error code to represent d,
> 
> so
> 
>>>that leaves us with a.  This conclusion, however, needs to be clearly
>>>stated in the spec.  I believe the current SGE implementation
> 
> implements c.
> 
>>
>>It is not possible to prevent race condition except by not using
>>drmaa_wait() the way you describe it.
>>
>>I believe reasonable behaviour would be one gets the job exit and
>>resource information. The other gets DRMAA_ERRNO_INVALID_JOB very
>>much as if drmaa_wait() had been issued past the first one has
>>reaped the job.
> 
> 
> How is there a race condition in choice a?  All threads waiting for a 
> job gets copies of the exit status and resource usage when the job 
> exists, then the info is disposed of.  Everyone is happy.  Latecomers 
> get a DRMAA_ERRNO_INVALID_JOB.
> Being the one who has to implement this stuff, I realize that this is
a 
> lot harder than it sounds, but it is decidedly possible to implement.
> However, if what we should decide that my use case is not valid, that 
> needs to be explicitly stated in the spec.
> 
> Daniel
>