[DRMAA-WG] Use Cases for Wait

Daniel Templeton Dan.Templeton at Sun.COM
Thu Aug 13 10:45:50 CDT 2009


In the last working group meeting, we spent a good bit of time talking 
about how to design the new wait call in the v2 spec.  In the end, what 
we decided is that we don't have a clear enough idea of what wait should 
actually do to be able to define it.  To that end, we vowed to generate 
some use cases.  This is my attempt at getting that thread going.

1) Job monitoring application
I have an application that submits jobs on behalf of the user and then 
displays the jobs' status as they go from pending to running to 
finished.  The application submits the jobs and then waits for any job 
to reach the running or finished state.  When a transition occurs, the 
application updates the UI with the new state information.  If a 
transition gets lost, then the UI will have stale data that could 
mislead the user.

2) qsub
I want to reimplement qsub using DRMAAv2.  qsub has two interesting 
options.  -now tells qsub to wait to return until the job has been 
started.  -sync tells qsub not to return until the job has completed.  
Both can be used in the same submission.  qsub will submit a job and 
then, based on the options, it might call wait to wait for job start or 
job finish.  It cannot miss either of these transition changes, because 
it stays blocked until it sees them.

3) suspend timeout
I have an application that submits tens of thousands of jobs and waits 
for them to complete.  The jobs are short, however, and if one gets 
suspended, it's better to submit a second copy and then keep the winner 
and kill the loser.  I want to give my jobs a 30-second suspension grace 
period.  After any job is suspended for more than 30 seconds, then I 
want to submit a duplicate.  The application would submit the jobs and 
then wait for any job to enter or exit the suspended state.  When a 
transition happens, an in-memory time table gets updated and a timer 
gets set.  It then waits again for the next transition.  Because of the 
volume of jobs being submitted, the number of jobs being suspended or 
resumed at any moment in time could be very large.

4) state tracking
I want to write an application that submits a single job and then 
records the time that every state transition occurred.  It submits the 
job and then waits for that job to have any state transition.  When a 
state transition occurs, it writes the information to a file and then 
waits for the next transition.  Because writing to a file to slow, there 
could be a lag between calls to wait, but nonetheless, it cannot lose 
any transitions.

Tag, you're it!
Daniel


More information about the drmaa-wg mailing list