[drmaa-wg] Simultaneous waits on the same job id?
Ed Baskerville
lists at edbaskerville.com
Fri Jun 23 19:25:51 CDT 2006
OK, I'll do that. But one more question: how do these issues apply to
synchronize? Consider the following sequence:
thread 1: wait(job id 5)
thread 2: synchronize(job ids 5,6,7)
[job id 5 finishes]
¿thread 2 synchronize call fails?
[job id 7 finishes]
[job id 6 finishes]
¿thread 2 synchronize call succeeds?
Should synchronize fail with INVALID_JOB as soon as any of the ids
it's waiting on are reaped? Or should it eventually succeed?
--Ed
On Jun 23, 2006, at 4:22 PM, Daniel Templeton wrote:
> Ed,
>
> I'm with you 100%, but this was discussed at length, and the group
> (minus me) decided that it was important to follow the POSIX wait
> (4) semantics, in which only one thread will succeed in waiting and
> the others will fail when the job is reaped.
>
> At some point I had found another POSIX wait variant which allowed
> all waiting threads to receive the exit status information, but I
> have since forgotten what it was, and Sun's new email retention
> policy has deleted the email.
>
> Daniel
>
> Ed Baskerville wrote:
>> This implies that the second call should fail when interpreted in
>> the context of a multithreaded application, but it doesn't really
>> seem to be written with a mulithreaded application. There's no
>> error code that makes sense here: INVALID_JOB implies that the job
>> data has already been reaped, but that's not necessarily true,
>> because you could have something like this:
>>
>> thread 1: wait(jobId)
>> thread 2: wait(jobId), immediately returns INVALID_JOB_ID because
>> there's already a wait in progress
>> thread 1: wait times out
>> thread 2: wait(jobId)...completes successfully
>>
>> So thread 2 is first told that the job data has already been
>> reaped, then told that the job is valid (because thread 1 happened
>> to time out). That's just weird.
>>
>> Another option is to simply not return INVALID_JOB_ID until the
>> data *has* been reaped (or not), but that seems weird too--why
>> make subsequent threads wait if they're probably just going to get
>> an error message?
>>
>> If this hasn't been decided, I would propose that a provision be
>> added saying that multiple threads are allowed to wait
>> simultaneously, and *all of them* get back the job data. It's not
>> too hard to implement, at least for Xgrid, and the semantics seem
>> cleaner.
>>
>> --Ed
>>
>>
>> On Jun 23, 2006, at 1:52 PM, Rajic, Hrabri wrote:
>>
>>> Good question. There is no such provision in the spec. One thread
>>> would need to be the first ...
>>>
>>> Hrabri
>>>
>>>
>>>> -----Original Message-----
>>>> From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On
>>>> Behalf
>>> Of
>>>> Ed Baskerville
>>>> Sent: Friday, June 23, 2006 3:32 PM
>>>> To: DRMAA Working Group
>>>> Subject: [drmaa-wg] Simultaneous waits on the same job id?
>>>>
>>>> With all the discussion of wait in multithreaded contexts, I
>>>> thought
>>>> I'd throw out another related question...
>>>>
>>>> Are multiple threads allowed to wait simultaneously on the same job
>>>> id and get back results, or is it required that one of them gets
>>>> back
>>>> DRMAA_ERRNO_INVALID_JOB? That is, is it possible for the data to be
>>>> reaped simultaneously for multiple waiting threads, or must only
>>>> one
>>>> of them be lucky enough to get the data back?
>>>>
>>>> For Xgrid, either way is straightforward to implement; obviously
>>>> having the option of returning data to multiple simultaneous calls
>>>> would be nice, but I want to get the semantics right.
>>>>
>>>> --Ed
>>>
>>
>
More information about the drmaa-wg
mailing list