[drmaa-wg] Synchronizing Against Waited Jobs

Daniel Templeton Dan.Templeton at Sun.COM
Wed Oct 19 11:37:07 CDT 2005


We have found a bug in the SGE DRMAA implementation, (I know! It's
shocking!) but Andreas and I can't agree on what the fix should be.  The
issue is that in the current implementation, synchronizing against jobs
that did not come from the current session returns DRMAA_ERRNO_SUCCESS. 
The part about which we disagree is what should happen when
synchronizing against jobs that are from the current session, but that
have already ended and have already had drmaa_wait() (or
drmaa_synchronize() with dispose=true) called against them.

My stance is that one can extrapolate from the drmaa_wait() function
that there is no difference between jobs which don't exist (at all or in
the current session) and jobs whose exit information has been disposed
(via drmaa_wait() or drmaa_synchronize()).  Therefore, calling
drmaa_synchronize() on jobs which have already had drmaa_wait() called
against them should return DRMAA_ERRNO_INVALID_JOB.

Andreas holds that it can be inferred from the lack of the above
statement in the spec, that drmaa_synchronize() handles such jobs
differently from drmaa_wait().  Because drmaa_synchronize() does not
need the jobs' exit information to succeed, it should be able to operate
on jobs whose exit information has already been disposed.  Therefore,
calling drmaa_synchronize() on jobs which have already had drmaa_wait()
called against them should return DRMAA_ERRNO_SUCCESS.

I can agree that Andreas' position makes theoretical sense, but I
believe it runs contrary to the stated goal of minimizing the
requirements on the implementing DRMS.  In order to implement a
drmaa_synchronize() that can distinguish between job's that have been
disposed and jobs that never existed, the DRMAA implementation must keep
a list of the ids of every job that has ever been submitted in the
current session, and with every drmaa_synchronize() call, the list must
be searched to validate the synchronize id list.  And for what? 
DRMAA_JOB_IDS_ALL covers every case I can think of where the behavior
Andreas described would be useful. To me, it sounds like a lot of extra
work for the DRMAA implementation with no tangible benefit.

On what Andreas and I can agree is that if we decide he is right, we
will close the bug as "won't fix" because the fix will be worse than the
bug.  In any case, we should probably have a tracker item to make the
final decision explicit in the spec.

What say you, oh, wise ones?

Daniel

-- 
***************************************************
*        Daniel Templeton   ERGB01 x60220         *
*       Staff Engineer, Sun N1 Grid Engine        *
***************************************************
* "So let the sunshine in.  Face it with a grin.  *
*  Smilers never lose, and frowners never win."   *
*      -Let the Sunshine In, Pebbles Flintstone   *
***************************************************






More information about the drmaa-wg mailing list