[drmaa-wg] drmaa_wait() Clarification

Daniel Templeton Dan.Templeton at Sun.COM
Mon Jan 24 09:18:52 CST 2005


I'm not talking about skittish grey areas.  I mean:

1) Thread 1 does drmaa_wait ("1", -1)
2) Thread 2 does drmaa_wait ("1", -1)
3) Job 1 ends

Who gets what in that case?  The spec does not address concurrent 
access.  It says that if you swap steps 2 and 3, then the second thread 
gets an error.

Daniel

Rajic, Hrabri wrote:

> The spec says there is no reaping data for the late comer.
> 
> A quality implementation could try to provide both threads with
> everything if the second request comes during the first request
> processing.  This is a grey area and our policy so far has been not to
> have things over-specified.
> 
>     -Hrabri
>  
> 
> -----Original Message-----
> From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf
> Of Daniel Templeton
> Sent: Monday, January 24, 2005 8:35 AM
> Cc: DRMAA Working Group
> Subject: Re: [drmaa-wg] drmaa_wait() Clarification
> 
> Andreas Haas wrote:
> 
>>On Mon, 24 Jan 2005, Daniel Templeton wrote:
>>
>>
>>
>>>How is an implementation supposed to handle the case where two threads
>>>call drmaa_wait() on the same job id?  The choices are:
>>>
>>>a) Both get notified when the job ends and both gets copies of the job
>>>exit and resource usage information
>>>b) Both get notified when the job ends.  One gets the job exit and
>>>resource information and the other gets a DRMAA_ERRNO_NO_RUSAGE.
>>>c) Both get notified when the job ends.  Which gets a copy of the job
>>>exit and resource information and which gets a DRMAA_ERRNO_NO_RUSAGE
>>>depends on which thread runs when.
>>>d) That's not allowed
>>>
>>>b and c are race conditions and there's no error code to represent d,
> 
> so
> 
>>>that leaves us with a.  This conclusion, however, needs to be clearly
>>>stated in the spec.  I believe the current SGE implementation
> 
> implements c.
> 
>>
>>It is not possible to prevent race condition except by not using
>>drmaa_wait() the way you describe it.
>>
>>I believe reasonable behaviour would be one gets the job exit and
>>resource information. The other gets DRMAA_ERRNO_INVALID_JOB very
>>much as if drmaa_wait() had been issued past the first one has
>>reaped the job.
> 
> 
> How is there a race condition in choice a?  All threads waiting for a 
> job gets copies of the exit status and resource usage when the job 
> exists, then the info is disposed of.  Everyone is happy.  Latecomers 
> get a DRMAA_ERRNO_INVALID_JOB.
> Being the one who has to implement this stuff, I realize that this is a 
> lot harder than it sounds, but it is decidedly possible to implement.
> However, if what we should decide that my use case is not valid, that 
> needs to be explicitly stated in the spec.
> 
> Daniel
> 





More information about the drmaa-wg mailing list