[drmaa-wg] [Fwd: Session problem with DRMAA bindings]

Daniel Templeton Dan.Templeton at Sun.COM
Tue May 17 03:45:34 CDT 2005


The following email in from an internal group at Sun who is working with
DRMAA.  Just thought you guys would like to see a neophyte's reaction to
our wait() sematics.  Clearly there's some misunderstanding about being
able to have multiple threads waiting simultaneously, but aside from
that, it's an interesting perspective.

Daniel

-------- Original Message --------

Hello Dan,

I think I've isolated the trouble we've been having with the DRMAA 
bindings regarding session.wait(jobId, timeout) always complaining that 
the jobId doesn't exist.  The following code works as expected:

SubmitJobAndWait:
	acquire and initialize a session.
	create a job template and submit it through the session.
	wait for job completion.
	close the session.

If we separate the job submission and the wait into two processes (as 
happens in our code) :

SubmitJob:    // process 1
	acquire and initialize a session.
	create a job template and submit it through the session.
	close the session.

Wait:  // process 2
	acquire and initialize a session.
	wait for job completion.
	close the session.

We get an exception when we wait for job completion complaining about 
there being no such jobId.  Even after process 2 ends with the 
exception I can do a 'qstat -f' and see the job from process 1 still 
running.

We can even stimulate the problem with the following code:

SubmitJobAndWait:
	acquire and initialize a session.
	create a job template and submit it through the session.
	close the session.
	acquire and initialize a session.
	wait for job completion.
	close the session.

This fails in the same manner as the two process version.  My 
understanding of the bindings is that session is not reentrant and, 
therefore, we will need to use separate processes to monitor/wait for 
more than one job.  Even if we tried to do some sort of time slicing by 
setting a small timeout to the wait we still would have problems with 
other session calls, such as control and getJobStatus.

Any ideas about what is going wrong?

Thanks,
Mike



-- 
***************************************************
*        Daniel Templeton   ERGB01 x60220         *
*       Staff Engineer, Sun N1 Grid Engine        *
***************************************************
* "Roads? Where we're going we don't need roads." *
*                    -Dr. Emmett Brown            *
*                     Back to the Future (1985)   *
***************************************************






More information about the drmaa-wg mailing list