[drmaa-wg] Synchronizing Against Waited Jobs

Daniel Templeton Dan.Templeton at Sun.COM
Mon Oct 24 10:23:45 CDT 2005


That sounds like a majority to me.  I will submit a tracker using
Roger's lovely summary of the correct behavior.

Daniel

Roger Brobst wrote On 10/24/05 17:15,:

>My opinion ...
>
>Because a DRMAA implementation is not required to
>retain information about jobs which have been reaped,
>drmaa_synchronize should not be required to
>distinguish between non-existant and reaped jobs.
>
>A drmaa_synchronize implementation should return
>DRMAA_ERRNO_INVALID_JOB if a provided jobID is
>unrecognized.
>
>If a drmaa_synchronize implementation successfully
>validates a jobID for a reaped job, it may
>return DRMAA_ERRNO_SUCCESS.
>
>-Roger
>
>
>In a previous e-mail, Daniel Templeton wrote:
>  
>
>>So, the only person who hasn't weighed in is Roger.
>>Care to offer an opinion?
>>
>>Daniel
>>
>>Peter Troeger wrote On 10/21/05 10:21,:
>>
>>    
>>
>>>I support the argumentation of Hrabri. DRMAA introduced "dispose=true" 
>>>in the interface, so resource consumption seems to be an issue. If a job 
>>>was subject to drmaa_wait(), and the data was disposed, nothing should 
>>>be left in memory about this job. IMHO the job becomes completely 
>>>unknown to the library after this point.
>>>
>>>
>>>BTW, this holds also for the current Condor DRMAA implementation. It is 
>>>also reasoned by the behavior of the underlying Condor system. If a job 
>>>was finished, only the log files can tell you what happened. The Condor 
>>>DRMAA library uses such a log file for each job, and if you execute 
>>>drmaa_wait(dispose=true), the log file and in-memory structures for the 
>>>job are removed. Calling drmaa_synchronize() after this results in 
>>>DRMAA_ERRNO_INVALID_JOB.
>>>
>>>Things might be clearer if we would have an explicit drmaa_dispose_job() 
>>>function.
>>>
>>>Regards,
>>>Peter.
>>>
>>>
>>>
>>>Rajic, Hrabri schrieb:
>>>
>>> 
>>>
>>>      
>>>
>>>>My wig is in dry cleaning.  Nevertheless, here is my short take on this.
>>>>
>>>>
>>>>If an implementation has handy job_id's it could conveniently make good
>>>>determination which jobs are invalid (do not exist) and throw
>>>>DRMAA_ERRNO_INVALID_JOB.   IMHO, it is not a big deal if the routine
>>>>gives imprecise diagnostics if it is forced to do memory garbage
>>>>collection earlier.  Quality of implementation term comes to mind, but
>>>>that quality could come at the expense of being memory hog that in turn
>>>>could lead to paging - quite dubious. 
>>>>See, the implementations might differently handle jobs that did not come
>>>>   
>>>>
>>>>        
>>>>
>>>>from the current session, so we could not be precise here either.
>>> 
>>>
>>>      
>>>
>>>>The important thing for the user is to synchronize i.e. block program
>>>>   
>>>>
>>>>        
>>>>
>>>>from continuing if there are running remote jobs.  
>>> 
>>>
>>>      
>>>
>>>>Dispose = true helps get rid of the rusage info to free DRMAA
>>>>implementations of heavy memory requirements when it matters, so keeping
>>>>all the past job_ids for providing precise exit errors runs contrary to
>>>>the goal of lessening memory requirements in the same routine.
>>>>
>>>>My 2 pfennigs,
>>>>
>>>>Hrabri
>>>>
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf
>>>>>  
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>Of
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Daniel Templeton
>>>>>Sent: Wednesday, October 19, 2005 11:37 AM
>>>>>To: DRMAA Working Group
>>>>>Subject: [drmaa-wg] Synchronizing Against Waited Jobs
>>>>>
>>>>>We have found a bug in the SGE DRMAA implementation, (I know! It's
>>>>>shocking!) but Andreas and I can't agree on what the fix should be.
>>>>>  
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>The
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>issue is that in the current implementation, synchronizing against jobs
>>>>>that did not come from the current session returns DRMAA_ERRNO_SUCCESS.
>>>>>The part about which we disagree is what should happen when
>>>>>synchronizing against jobs that are from the current session, but that
>>>>>have already ended and have already had drmaa_wait() (or
>>>>>drmaa_synchronize() with dispose=true) called against them.
>>>>>
>>>>>My stance is that one can extrapolate from the drmaa_wait() function
>>>>>that there is no difference between jobs which don't exist (at all or
>>>>>  
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>in
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>the current session) and jobs whose exit information has been disposed
>>>>>(via drmaa_wait() or drmaa_synchronize()).  Therefore, calling
>>>>>drmaa_synchronize() on jobs which have already had drmaa_wait() called
>>>>>against them should return DRMAA_ERRNO_INVALID_JOB.
>>>>>
>>>>>Andreas holds that it can be inferred from the lack of the above
>>>>>statement in the spec, that drmaa_synchronize() handles such jobs
>>>>>differently from drmaa_wait().  Because drmaa_synchronize() does not
>>>>>need the jobs' exit information to succeed, it should be able to
>>>>>  
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>operate
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>on jobs whose exit information has already been disposed.  Therefore,
>>>>>calling drmaa_synchronize() on jobs which have already had drmaa_wait()
>>>>>called against them should return DRMAA_ERRNO_SUCCESS.
>>>>>
>>>>>I can agree that Andreas' position makes theoretical sense, but I
>>>>>believe it runs contrary to the stated goal of minimizing the
>>>>>requirements on the implementing DRMS.  In order to implement a
>>>>>drmaa_synchronize() that can distinguish between job's that have been
>>>>>disposed and jobs that never existed, the DRMAA implementation must
>>>>>  
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>keep
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>a list of the ids of every job that has ever been submitted in the
>>>>>current session, and with every drmaa_synchronize() call, the list must
>>>>>be searched to validate the synchronize id list.  And for what?
>>>>>DRMAA_JOB_IDS_ALL covers every case I can think of where the behavior
>>>>>Andreas described would be useful. To me, it sounds like a lot of extra
>>>>>work for the DRMAA implementation with no tangible benefit.
>>>>>
>>>>>On what Andreas and I can agree is that if we decide he is right, we
>>>>>will close the bug as "won't fix" because the fix will be worse than
>>>>>  
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>the
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>bug.  In any case, we should probably have a tracker item to make the
>>>>>final decision explicit in the spec.
>>>>>
>>>>>What say you, oh, wise ones?
>>>>>
>>>>>Daniel
>>>>>
>>>>>--
>>>>>***************************************************
>>>>>*        Daniel Templeton   ERGB01 x60220         *
>>>>>*       Staff Engineer, Sun N1 Grid Engine        *
>>>>>***************************************************
>>>>>* "So let the sunshine in.  Face it with a grin.  *
>>>>>*  Smilers never lose, and frowners never win."   *
>>>>>*      -Let the Sunshine In, Pebbles Flintstone   *
>>>>>***************************************************
>>>>>
>>>>>  
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>>-- 
>>***************************************************
>>*        Daniel Templeton   ERGB01 x60220         *
>>*       Staff Engineer, Sun N1 Grid Engine        *
>>***************************************************
>>* "So let the sunshine in.  Face it with a grin.  *
>>*  Smilers never lose, and frowners never win."   *
>>*      -Let the Sunshine In, Pebbles Flintstone   *
>>***************************************************
>>
>>    
>>
>
>  
>

-- 
***************************************************
*        Daniel Templeton   ERGB01 x60220         *
*       Staff Engineer, Sun N1 Grid Engine        *
***************************************************
* "So let the sunshine in.  Face it with a grin.  *
*  Smilers never lose, and frowners never win."   *
*      -Let the Sunshine In, Pebbles Flintstone   *
***************************************************






More information about the drmaa-wg mailing list