[drmaa-wg] Questions

Peter Troeger peter.troeger at hpi.uni-potsdam.de
Thu Apr 7 04:09:49 CDT 2005


>> The routine return code would need to indicate a compound error; BTW we
>> do not have such error code defined, and the detailed error message
>> would need to detail what happened.
>
>
> In other words, the spec completely fails to address this case. 
> Something to keep in mind for 1.1 or 2.0.

I added a tracker item for the issue.

>>> When doing a drmaa_control(DRMAA_JOB_IDS_SESSION_ALL), what is the
>>> contract on failure, i.e. in what state will the jobs be left?  In the
>>> case of a job failure, does that mean that all jobs will be left in the
>>> state that they were in before the call?  If so, that's going to cause
>>> serious implementation problems.  If not, that's going to cause serious
>>> usability problems.
>>
>> Transactional interface would be quite useful here ...
>> If a routine exits/fails during the call there is no good recourse.
>
> Exactly the point I'm making.  Without transactions, it's hard to use. 
> With transactions, it's hard to implement.

To demand a transactional behavior seems to me non-realistic. Most other 
groups (e.g. OGSA) have similar problems, take for example the 
SetResourceProperties operation in WS-ResourceProperties specification 
(chapter 7). The usual approach is to declare the problem as 
implementation-dependent.

>>> (DRMAA_JOB_IDS_SESSION_ALL), but another thread "steals" the job exit
>>> info with a call to drmaa_wait()?  I would assume that the synchronize
>>> thread should just assume that the job finished, even though its job
>>> record is gone.  That is what the SGE implementation does.
>>
>>
>> Ha, races with job reaping info.  The developers would need to be
>> careful in multithreaded environments ... some guidelines would be
>> necessary, but preferably outside of the normative docs.
>
>
> The reason I bring it up is that this particular case is non-obvious. 
> It's clear that waiting for the same job twice is bad, but it's not so 
> clear when waiting for any or all.

The result seems to be that we need more clarification about 
multithreading issues in the spec. Is it worthwhile to open a tracker 
item for this, in order to collect all the specific findings ?

Regards,
Peter.



.





More information about the drmaa-wg mailing list