[drmaa-wg] Synchronizing Against Waited Jobs

Daniel Templeton Dan.Templeton at Sun.COM
Fri Oct 21 08:50:43 CDT 2005


So, the only person who hasn't weighed in is Roger.  Care to offer an
opinion?

Daniel

Peter Troeger wrote On 10/21/05 10:21,:

>I support the argumentation of Hrabri. DRMAA introduced "dispose=true" 
>in the interface, so resource consumption seems to be an issue. If a job 
>was subject to drmaa_wait(), and the data was disposed, nothing should 
>be left in memory about this job. IMHO the job becomes completely 
>unknown to the library after this point.
>
>
>BTW, this holds also for the current Condor DRMAA implementation. It is 
>also reasoned by the behavior of the underlying Condor system. If a job 
>was finished, only the log files can tell you what happened. The Condor 
>DRMAA library uses such a log file for each job, and if you execute 
>drmaa_wait(dispose=true), the log file and in-memory structures for the 
>job are removed. Calling drmaa_synchronize() after this results in 
>DRMAA_ERRNO_INVALID_JOB.
>
>Things might be clearer if we would have an explicit drmaa_dispose_job() 
>function.
>
>Regards,
>Peter.
>
>
>
>Rajic, Hrabri schrieb:
>
>  
>
>>My wig is in dry cleaning.  Nevertheless, here is my short take on this.
>>
>>
>>If an implementation has handy job_id's it could conveniently make good
>>determination which jobs are invalid (do not exist) and throw
>>DRMAA_ERRNO_INVALID_JOB.   IMHO, it is not a big deal if the routine
>>gives imprecise diagnostics if it is forced to do memory garbage
>>collection earlier.  Quality of implementation term comes to mind, but
>>that quality could come at the expense of being memory hog that in turn
>>could lead to paging - quite dubious. 
>>See, the implementations might differently handle jobs that did not come
>>    
>>
>>from the current session, so we could not be precise here either.
>  
>
>>The important thing for the user is to synchronize i.e. block program
>>    
>>
>>from continuing if there are running remote jobs.  
>  
>
>> 
>>Dispose = true helps get rid of the rusage info to free DRMAA
>>implementations of heavy memory requirements when it matters, so keeping
>>all the past job_ids for providing precise exit errors runs contrary to
>>the goal of lessening memory requirements in the same routine.
>>
>>My 2 pfennigs,
>>
>>Hrabri
>>
>> 
>>
>>    
>>
>>>-----Original Message-----
>>>From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf
>>>   
>>>
>>>      
>>>
>>Of
>> 
>>
>>    
>>
>>>Daniel Templeton
>>>Sent: Wednesday, October 19, 2005 11:37 AM
>>>To: DRMAA Working Group
>>>Subject: [drmaa-wg] Synchronizing Against Waited Jobs
>>>
>>>We have found a bug in the SGE DRMAA implementation, (I know! It's
>>>shocking!) but Andreas and I can't agree on what the fix should be.
>>>   
>>>
>>>      
>>>
>>The
>> 
>>
>>    
>>
>>>issue is that in the current implementation, synchronizing against jobs
>>>that did not come from the current session returns DRMAA_ERRNO_SUCCESS.
>>>The part about which we disagree is what should happen when
>>>synchronizing against jobs that are from the current session, but that
>>>have already ended and have already had drmaa_wait() (or
>>>drmaa_synchronize() with dispose=true) called against them.
>>>
>>>My stance is that one can extrapolate from the drmaa_wait() function
>>>that there is no difference between jobs which don't exist (at all or
>>>   
>>>
>>>      
>>>
>>in
>> 
>>
>>    
>>
>>>the current session) and jobs whose exit information has been disposed
>>>(via drmaa_wait() or drmaa_synchronize()).  Therefore, calling
>>>drmaa_synchronize() on jobs which have already had drmaa_wait() called
>>>against them should return DRMAA_ERRNO_INVALID_JOB.
>>>
>>>Andreas holds that it can be inferred from the lack of the above
>>>statement in the spec, that drmaa_synchronize() handles such jobs
>>>differently from drmaa_wait().  Because drmaa_synchronize() does not
>>>need the jobs' exit information to succeed, it should be able to
>>>   
>>>
>>>      
>>>
>>operate
>> 
>>
>>    
>>
>>>on jobs whose exit information has already been disposed.  Therefore,
>>>calling drmaa_synchronize() on jobs which have already had drmaa_wait()
>>>called against them should return DRMAA_ERRNO_SUCCESS.
>>>
>>>I can agree that Andreas' position makes theoretical sense, but I
>>>believe it runs contrary to the stated goal of minimizing the
>>>requirements on the implementing DRMS.  In order to implement a
>>>drmaa_synchronize() that can distinguish between job's that have been
>>>disposed and jobs that never existed, the DRMAA implementation must
>>>   
>>>
>>>      
>>>
>>keep
>> 
>>
>>    
>>
>>>a list of the ids of every job that has ever been submitted in the
>>>current session, and with every drmaa_synchronize() call, the list must
>>>be searched to validate the synchronize id list.  And for what?
>>>DRMAA_JOB_IDS_ALL covers every case I can think of where the behavior
>>>Andreas described would be useful. To me, it sounds like a lot of extra
>>>work for the DRMAA implementation with no tangible benefit.
>>>
>>>On what Andreas and I can agree is that if we decide he is right, we
>>>will close the bug as "won't fix" because the fix will be worse than
>>>   
>>>
>>>      
>>>
>>the
>> 
>>
>>    
>>
>>>bug.  In any case, we should probably have a tracker item to make the
>>>final decision explicit in the spec.
>>>
>>>What say you, oh, wise ones?
>>>
>>>Daniel
>>>
>>>--
>>>***************************************************
>>>*        Daniel Templeton   ERGB01 x60220         *
>>>*       Staff Engineer, Sun N1 Grid Engine        *
>>>***************************************************
>>>* "So let the sunshine in.  Face it with a grin.  *
>>>*  Smilers never lose, and frowners never win."   *
>>>*      -Let the Sunshine In, Pebbles Flintstone   *
>>>***************************************************
>>>
>>>   
>>>
>>>      
>>>
>> 
>>
>>    
>>
>
>  
>

-- 
***************************************************
*        Daniel Templeton   ERGB01 x60220         *
*       Staff Engineer, Sun N1 Grid Engine        *
***************************************************
* "So let the sunshine in.  Face it with a grin.  *
*  Smilers never lose, and frowners never win."   *
*      -Let the Sunshine In, Pebbles Flintstone   *
***************************************************






More information about the drmaa-wg mailing list