[drmaa-wg] drmaa_wait() Clarification
Daniel Templeton
Dan.Templeton at Sun.COM
Mon Jan 24 09:18:52 CST 2005
I'm not talking about skittish grey areas. I mean:
1) Thread 1 does drmaa_wait ("1", -1)
2) Thread 2 does drmaa_wait ("1", -1)
3) Job 1 ends
Who gets what in that case? The spec does not address concurrent
access. It says that if you swap steps 2 and 3, then the second thread
gets an error.
Daniel
Rajic, Hrabri wrote:
> The spec says there is no reaping data for the late comer.
>
> A quality implementation could try to provide both threads with
> everything if the second request comes during the first request
> processing. This is a grey area and our policy so far has been not to
> have things over-specified.
>
> -Hrabri
>
>
> -----Original Message-----
> From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf
> Of Daniel Templeton
> Sent: Monday, January 24, 2005 8:35 AM
> Cc: DRMAA Working Group
> Subject: Re: [drmaa-wg] drmaa_wait() Clarification
>
> Andreas Haas wrote:
>
>>On Mon, 24 Jan 2005, Daniel Templeton wrote:
>>
>>
>>
>>>How is an implementation supposed to handle the case where two threads
>>>call drmaa_wait() on the same job id? The choices are:
>>>
>>>a) Both get notified when the job ends and both gets copies of the job
>>>exit and resource usage information
>>>b) Both get notified when the job ends. One gets the job exit and
>>>resource information and the other gets a DRMAA_ERRNO_NO_RUSAGE.
>>>c) Both get notified when the job ends. Which gets a copy of the job
>>>exit and resource information and which gets a DRMAA_ERRNO_NO_RUSAGE
>>>depends on which thread runs when.
>>>d) That's not allowed
>>>
>>>b and c are race conditions and there's no error code to represent d,
>
> so
>
>>>that leaves us with a. This conclusion, however, needs to be clearly
>>>stated in the spec. I believe the current SGE implementation
>
> implements c.
>
>>
>>It is not possible to prevent race condition except by not using
>>drmaa_wait() the way you describe it.
>>
>>I believe reasonable behaviour would be one gets the job exit and
>>resource information. The other gets DRMAA_ERRNO_INVALID_JOB very
>>much as if drmaa_wait() had been issued past the first one has
>>reaped the job.
>
>
> How is there a race condition in choice a? All threads waiting for a
> job gets copies of the exit status and resource usage when the job
> exists, then the info is disposed of. Everyone is happy. Latecomers
> get a DRMAA_ERRNO_INVALID_JOB.
> Being the one who has to implement this stuff, I realize that this is a
> lot harder than it sounds, but it is decidedly possible to implement.
> However, if what we should decide that my use case is not valid, that
> needs to be explicitly stated in the spec.
>
> Daniel
>
More information about the drmaa-wg
mailing list