[drmaa-wg] DRMAA test suite

Peter Troeger peter.troeger at hpi.uni-potsdam.de
Wed Jun 28 06:32:22 CDT 2006


>
> I know we discussed this at length already, but I remember the  
> discussion being about synchronize().  Sorry to beat dead horses,  
> but this one seems like a real handicap to me.  Basically, you're  
> saying that wait(), a very important element of the API, isn't  
> useful in an MT environment.  I've never seen a DRMAA app, other  
> than the most trivial example code, that doesn't call wait().  The  
> logical conclusion, then, is that DRMAA isn't useful in an MT  
> environment.

Full stop. Reading my own text again, I realize that I mixed up ANY  
and ALL semantics from wait() and synchronize(). We talk about wait 
(ANY), which returns one result, and synchronize(ALL) which depends  
on all jobs in the session. My argumentation was that the session can  
change during the wait/sync operation - which is bad for the sync 
(ALL) operation, but not a huge problem for the wait(ANY) case. Sorry  
for wasting your time ...

> For synchronize(), keeping the call-time context makes perfect  
> sense.  I can see where one could strictly argue that if it's good  
> for synchronize(), it's good for wait(), but we need to be a little  
> pragmatic.  Again, I ask the question, "what problem

What I really wanted to ensure is that the operation with the  
SESSION_ALL argument has call-time context. I understand that you  
completely agree to this thing, so we are done.

> The up-side is that we're talking about the IDL spec, and not the  
> DRMAA 1.0 spec, so we have room to make changes still.  And, in  
> case it wasn't clear from my tirade, the SGE DRMAA implementations  
> do not limit the context of wait(), so the proposed semantics will  
> be a change (for the worse) for SGE users.

I fear that the misleading text in the IDL spec occurred from the  
same mixup. I need vacation ;-)

Sorry,
Peter.

P.S.: Who will hold the pencil for the upcoming work for the IDL  
spec ? Currently, the latest version is on my hard-disk.



>
> Daniel
>
> Peter Troeger wrote:
>>> Looking through the IDL spec, it says that drmaa_wait(ANY) will only
>>> work on jobs submitted up to the time of the drmaa_wait() call.   
>>> I don't
>>> like that.  For drmaa_synchronize(ALL), it makes sense, because
>>> otherwise the call would block indefinitely in an active system.   
>>> With
>>> drmaa_wait(), however, that change prevents a very useful use  
>>> case.  Say
>>> I want to write a thread that waits for jobs to end and places their
>>> finish information in a data structure for other threads to  
>>> read.  With
>>> that caveat applied, if I submit one very long-running job before
>>> drmaa_wait() gets called, the hundreds of really short jobs that I
>>> submit after the drmaa_wait() call have to wait for the long- 
>>> running job
>>> to end so that the next call to drmaa_wait() can see them.   
>>> That's bad,
>>> and I don't see where it makes anything better.  What problem does
>>> limiting drmaa_wait() to previously submitted jobs solve?
>>>
>>
>> We had so much discussion around the drmaa_wait semantics, I am  
>> not sure
>> what the exact reason was. For me, it seems like the same  
>> argumentation
>> as with drmaa_synchronize. The drmaa_wait() call relies on some  
>> current
>> state of all the jobs in the session. I know that I submitted 3  
>> jobs so
>> far, and now I want to wait for all of them. If we allow other  
>> threads
>> to extend the session while drmaa_wait() is running, you need to  
>> clarify
>> the point of synchronization within the running drmaa_wait() call.  
>> It's
>> harder to implement.
>>
>> In your particular example, my expectation would be that the second
>> thread also calls drmaa_wait() in parallel. In this case, our  
>> modified
>> text from the latest DRMAA doc can be applied:
>>
>> -- snip
>>
>> In a multithreaded environment, only the active thread gets the
>> status of the finished or failed job in that case, while the rest  
>> of the
>> threads continue waiting. If there are no more running or  
>> completed jobs
>> the routine SHOULD return DRMAA_ERRNO_INVALID_JOB error.
>>
>> -- snip
>>
>> We can summarize that drmaa_wait(SESSION_ANY) is always a bad idea  
>> when
>> multiple threads submitting jobs. In order to get a consistent  
>> picture,
>> it seems to be appropriate to define the function call as
>> "synchronization point", where the session state "at this time"  
>> acts as
>> input to the method.
>>
>>
>> Peter.
>>
>>
>>
>>
>





More information about the drmaa-wg mailing list