[drmaa-wg] Questions

Rajic, Hrabri hrabri.rajic at intel.com
Wed Mar 30 10:16:12 CST 2005


>-----Original Message-----
>From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf
Of
>Daniel Templeton
>Sent: Wednesday, March 30, 2005 3:34 AM
>To: DRMAA Working Group
>Subject: [drmaa-wg] Questions
>
>In working on a remote implementation of the Java binding, I have run
>into a couple of interesting questions.  What happens when during a
call
>to drmaa_control (DRMAA_JOB_IDS_SESSION_ALL), more the implementation
>fails to performs the given action on more than one job for different
>reasons.  For example, if I try to hold all jobs, but one job is
already
>in a hold state, three jobs work ok, and the DRM goes down before
acting
>on the last job, what is the return code?

The routine return code would need to indicate a compound error; BTW we
do not have such error code defined, and the detailed error message
would need to detail what happened.

>When doing a drmaa_control(DRMAA_JOB_IDS_SESSION_ALL), what is the
>contract on failure, i.e. in what state will the jobs be left?  In the
>case of a job failure, does that mean that all jobs will be left in the
>state that they were in before the call?  If so, that's going to cause
>serious implementation problems.  If not, that's going to cause serious
>usability problems.

Transactional interface would be quite useful here ...
If a routine exits/fails during the call there is no good recourse.

Job failure?  Is this a separate question?  
One analogy would be teaching a university course.  There would be
students dropping the course, but the rest goes ahead.  In case of
absences things also go ahead, and when the students reappear the regime
is known.

>What happens when a job ends after a thread has called
drmaa_synchronize
>(DRMAA_JOB_IDS_SESSION_ALL), but another thread "steals" the job exit
>info with a call to drmaa_wait()?  I would assume that the synchronize
>thread should just assume that the job finished, even though its job
>record is gone.  That is what the SGE implementation does.

Ha, races with job reaping info.  The developers would need to be
careful in multithreaded environments ... some guidelines would be
necessary, but preferably outside of the normative docs.

	Hrabri


>
>Daniel
>
>--
>***************************************************
>*        Daniel Templeton   ERGB01 x60220         *
>*       Staff Engineer, Sun N1 Grid Engine        *
>***************************************************
>* "Roads? Where we're going we don't need roads." *
>*                    -Dr. Emmett Brown            *
>*                     Back to the Future (1985)   *
>***************************************************
>





More information about the drmaa-wg mailing list