[drmaa-wg] Simultaneous waits on the same job id?

Ed Baskerville lists at edbaskerville.com
Fri Jun 23 19:25:51 CDT 2006


OK, I'll do that. But one more question: how do these issues apply to  
synchronize? Consider the following sequence:

thread 1: wait(job id 5)
thread 2: synchronize(job ids 5,6,7)
[job id 5 finishes]
¿thread 2 synchronize call fails?
[job id 7 finishes]
[job id 6 finishes]
¿thread 2 synchronize call succeeds?

Should synchronize fail with INVALID_JOB as soon as any of the ids  
it's waiting on are reaped? Or should it eventually succeed?

--Ed

On Jun 23, 2006, at 4:22 PM, Daniel Templeton wrote:

> Ed,
>
> I'm with you 100%, but this was discussed at length, and the group  
> (minus me) decided that it was important to follow the POSIX wait 
> (4) semantics, in which only one thread will succeed in waiting and  
> the others will fail when the job is reaped.
>
> At some point I had found another POSIX wait variant which allowed  
> all waiting threads to receive the exit status information, but I  
> have since forgotten what it was, and Sun's new email retention  
> policy has deleted the email.
>
> Daniel
>
> Ed Baskerville wrote:
>> This implies that the second call should fail when interpreted in  
>> the context of a multithreaded application, but it doesn't really  
>> seem to be written with a mulithreaded application. There's no  
>> error code that makes sense here: INVALID_JOB implies that the job  
>> data has already been reaped, but that's not necessarily true,  
>> because you could have something like this:
>>
>> thread 1: wait(jobId)
>> thread 2: wait(jobId), immediately returns INVALID_JOB_ID because  
>> there's already a wait in progress
>> thread 1: wait times out
>> thread 2: wait(jobId)...completes successfully
>>
>> So thread 2 is first told that the job data has already been  
>> reaped, then told that the job is valid (because thread 1 happened  
>> to time out). That's just weird.
>>
>> Another option is to simply not return INVALID_JOB_ID until the  
>> data *has* been reaped (or not), but that seems weird too--why  
>> make subsequent threads wait if they're probably just going to get  
>> an error message?
>>
>> If this hasn't been decided, I would propose that a provision be  
>> added saying that multiple threads are allowed to wait  
>> simultaneously, and *all of them* get back the job data. It's not  
>> too hard to implement, at least for Xgrid, and the semantics seem  
>> cleaner.
>>
>> --Ed
>>
>>
>> On Jun 23, 2006, at 1:52 PM, Rajic, Hrabri wrote:
>>
>>> Good question.  There is no such provision in the spec.  One thread
>>> would need to be the first ...
>>>
>>> Hrabri
>>>
>>>
>>>> -----Original Message-----
>>>> From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On  
>>>> Behalf
>>> Of
>>>> Ed Baskerville
>>>> Sent: Friday, June 23, 2006 3:32 PM
>>>> To: DRMAA Working Group
>>>> Subject: [drmaa-wg] Simultaneous waits on the same job id?
>>>>
>>>> With all the discussion of wait in multithreaded contexts, I  
>>>> thought
>>>> I'd throw out another related question...
>>>>
>>>> Are multiple threads allowed to wait simultaneously on the same job
>>>> id and get back results, or is it required that one of them gets  
>>>> back
>>>> DRMAA_ERRNO_INVALID_JOB? That is, is it possible for the data to be
>>>> reaped simultaneously for multiple waiting threads, or must only  
>>>> one
>>>> of them be lucky enough to get the data back?
>>>>
>>>> For Xgrid, either way is straightforward to implement; obviously
>>>> having the option of returning data to multiple simultaneous calls
>>>> would be nice, but I want to get the semantics right.
>>>>
>>>> --Ed
>>>
>>
>





More information about the drmaa-wg mailing list