[DRMAA-WG] OGF 28 Report

Peter Tröger peter at troeger.eu
Tue Mar 23 11:51:08 CDT 2010


> I disagree with unrenaming jobCategory.  First, no one had a clue what
> it was supposed to be, and second, it conflicts with the name of a
> completely different concept related to scheduling jobs with similar
> requirements.

Everybody in the room was using the old term, and I forgot the  
original user side perspective of this decision. We can rollback the  
unrenaming, it is just a draft ;-)

> I also disagree with adding queue support.  Yes, all DRMs have them,  
> but
> no, the definitions are not even remotely consistent.  My feeling is
> that by trying to support queues, we're going to introduce some odd  
> (and
> wrong) assumptions.  I'm willing to be convinced otherwise, though.

See other thread ...

/Peter.

> Daniel
>
> On 03/22/10 05:48, Peter Tröger wrote:
>> Dear all,
>>
>> after being back from a productive OGF 28 event in Munich, here is  
>> the long list of decisions we made. We (Mariusz, Daniel, Peter) had  
>> three sessions under continuos participation of the SAGA group.  
>> Huge thanks must go to Thilo and Andre, who resolved all open SAGA- 
>> related issues with us. We got great feedback from Yves Caniou  
>> about user requirements in parallel job execution.
>>
>> I also had one hour of intense discussion with Thijs Metsch from  
>> the OCCI working group. OCCI defines a RESTful interface for  
>> controlling cloud IaaS resources - virtual machines, networks, and  
>> storage. They would like to add task control to the OCCI use case  
>> landscape, and DRMAA looks like a surprisingly good candidate. We  
>> bring the semantics, they bring the protocol. When DRMAAv2 is  
>> fixed, Thijs and me intend to work out a DRMAA language binding  
>> spec for OCCI. This would bring as the long-demanded Remote-DRMAA  
>> variant.
>>
>> Best,
>> Peter.
>>
>> --- snip
>>
>> (Everything you see here should also be implemented in the Wiki)
>>
>> - categoryName --renamed-->  jobCategory (people used the old term  
>> all the time)
>> - startReservation --renamed-->  requestReservation
>> - Replaced global occurrences of the term "host" with "machine"
>> - New queue support
>> 	- Added support for queue name specification in JobTemplate
>> 	- Only one name supported - LSF and SGE only have support for  
>> multiple queue names; precedence rules would be unclear
>> 	- Three new monitoring attributes (drmQueueNames,  
>> maxWallclockTime, maxSlotsAllowed) on queue level
>> 	- New monitoring attributes demand notion of infinity ->    
>> NO_LIMIT constant
>> - Parallel job support
>> 	- Two classes: "spawns itself" vs "is spawned"
>> 		- First class: OpenMP, pthread, self-managed (shell script  
>> submitted)
>> 		- Second class: PVM / MPI jobs, categorization based on GFD.115
>> 	- General design approach: User defines the parallel application  
>> binary in cmdLine argument (in contrast to SGE thinking !)
>> 		- jobCategory attribute decides upon all infrastructure-relevant  
>> settings for parallel execution (libraries, paths, launch programs)
>> 		- leaded to according "drmJobCategoryNames" counterpart in  
>> MonitoringSession, in order to check DRMS capabilities
>> 		- supported job categories are site-specific, DRMAA web site  
>> offers standardized names
>> 		- Examples will follow soon on http://www.drmaa.org/jobcategories/
>> 		- DRMAA implementation most likely creates a shell script based  
>> on job category, and submits this one
>> 	- The application decides upon process spawning, but the scheduler  
>> still needs the information
>> 		- new job template attributes minSlots / maxSlots
>> 		- if minSlots>  1, you MUST define a jobCategory
>> 		- no need to have final slot count as placeholder macro (comes  
>> out of the parallel programming API anyway)
>> - MonitoringSession::machineLoad
>> 	- Removed coreNumber parameter, since the OS on the host migrates  
>> jobs between cores - no real sense in core load index
>> 	- Added a comment that this information should not be used for  
>> user-side scheduling decisions; just a gadget to implement qmon on- 
>> top-of DRMAA
>> - New job template attribute "accountingId", as in SAGA, JSDL, and  
>> the majority of systems
>> 	- not relevant in ReservationTemplate, since advance reservations  
>> do not count for job accounting
>> - File purging on execution host (demanded by SAGA) was rejected,  
>> no overall support in DRM systems
>> - New job template attributes for resource requirements -  
>> minPhysMemory, machineOS, machineArch, candidateMachines
>> 	- candidateMachines semantic: use sub-set or all of this hosts for  
>> execution, if not possible, reject job
>> - Advance Reservation interfaces
>> 	- use case for ReservationTemplate::nativeOptions ->  SGE demands  
>> queue name in advance reservation
>> 	- state model for reservations rejected for DRMAA
>> - Introduced "AbsoluteTime" abstraction data type in IDL text
>> - Job state model
>> 	- New StagedIn / StagedOut / Re-Scheduled state in job state model  
>> rejected, give hint in the spec to use sub states for this
>> 	- Going from running to queued is a special case (only for PBS),  
>> no new state; in case, emulate intermediate step in the library
>> - JobInfo will not be merged into Job ->  information should be  
>> consistent, always get one "performance snapshot"
>> 	- JobInfo becomes a value type, in order to express this more  
>> clearly
>> - getting the list of valid contact strings was rejected (not  
>> implementable)
>> - Bulk index placeholder support (in the API) will not be extended
>> 	- Instead, a new placeholder allows to insert the DRM systems bulk  
>> index env. variable name into the template (BULK_TASK_ID_VARNAME)
>> 	- Idea is that applications can assign the variable name to their  
>> own environment variable, and perform an "eval" on it later
>> 	- Peter fighted against the alternative idea:  Standardizing which  
>> environment variables a DRMAA library must define implicitly
>> - JobInfo: masterMachine and slaveMachines attributes are merged to  
>> an ordered string list (allocatedMachines)
>> 	- Implementation can assign some semantic to the ordering
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>   drmaa-wg mailing list
>>   drmaa-wg at ogf.org
>>   http://www.ogf.org/mailman/listinfo/drmaa-wg
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg



More information about the drmaa-wg mailing list