[DRMAA-WG] OGF 28 Report
Peter Tröger
peter at troeger.eu
Tue Mar 23 11:51:08 CDT 2010
> I disagree with unrenaming jobCategory. First, no one had a clue what
> it was supposed to be, and second, it conflicts with the name of a
> completely different concept related to scheduling jobs with similar
> requirements.
Everybody in the room was using the old term, and I forgot the
original user side perspective of this decision. We can rollback the
unrenaming, it is just a draft ;-)
> I also disagree with adding queue support. Yes, all DRMs have them,
> but
> no, the definitions are not even remotely consistent. My feeling is
> that by trying to support queues, we're going to introduce some odd
> (and
> wrong) assumptions. I'm willing to be convinced otherwise, though.
See other thread ...
/Peter.
> Daniel
>
> On 03/22/10 05:48, Peter Tröger wrote:
>> Dear all,
>>
>> after being back from a productive OGF 28 event in Munich, here is
>> the long list of decisions we made. We (Mariusz, Daniel, Peter) had
>> three sessions under continuos participation of the SAGA group.
>> Huge thanks must go to Thilo and Andre, who resolved all open SAGA-
>> related issues with us. We got great feedback from Yves Caniou
>> about user requirements in parallel job execution.
>>
>> I also had one hour of intense discussion with Thijs Metsch from
>> the OCCI working group. OCCI defines a RESTful interface for
>> controlling cloud IaaS resources - virtual machines, networks, and
>> storage. They would like to add task control to the OCCI use case
>> landscape, and DRMAA looks like a surprisingly good candidate. We
>> bring the semantics, they bring the protocol. When DRMAAv2 is
>> fixed, Thijs and me intend to work out a DRMAA language binding
>> spec for OCCI. This would bring as the long-demanded Remote-DRMAA
>> variant.
>>
>> Best,
>> Peter.
>>
>> --- snip
>>
>> (Everything you see here should also be implemented in the Wiki)
>>
>> - categoryName --renamed--> jobCategory (people used the old term
>> all the time)
>> - startReservation --renamed--> requestReservation
>> - Replaced global occurrences of the term "host" with "machine"
>> - New queue support
>> - Added support for queue name specification in JobTemplate
>> - Only one name supported - LSF and SGE only have support for
>> multiple queue names; precedence rules would be unclear
>> - Three new monitoring attributes (drmQueueNames,
>> maxWallclockTime, maxSlotsAllowed) on queue level
>> - New monitoring attributes demand notion of infinity ->
>> NO_LIMIT constant
>> - Parallel job support
>> - Two classes: "spawns itself" vs "is spawned"
>> - First class: OpenMP, pthread, self-managed (shell script
>> submitted)
>> - Second class: PVM / MPI jobs, categorization based on GFD.115
>> - General design approach: User defines the parallel application
>> binary in cmdLine argument (in contrast to SGE thinking !)
>> - jobCategory attribute decides upon all infrastructure-relevant
>> settings for parallel execution (libraries, paths, launch programs)
>> - leaded to according "drmJobCategoryNames" counterpart in
>> MonitoringSession, in order to check DRMS capabilities
>> - supported job categories are site-specific, DRMAA web site
>> offers standardized names
>> - Examples will follow soon on http://www.drmaa.org/jobcategories/
>> - DRMAA implementation most likely creates a shell script based
>> on job category, and submits this one
>> - The application decides upon process spawning, but the scheduler
>> still needs the information
>> - new job template attributes minSlots / maxSlots
>> - if minSlots> 1, you MUST define a jobCategory
>> - no need to have final slot count as placeholder macro (comes
>> out of the parallel programming API anyway)
>> - MonitoringSession::machineLoad
>> - Removed coreNumber parameter, since the OS on the host migrates
>> jobs between cores - no real sense in core load index
>> - Added a comment that this information should not be used for
>> user-side scheduling decisions; just a gadget to implement qmon on-
>> top-of DRMAA
>> - New job template attribute "accountingId", as in SAGA, JSDL, and
>> the majority of systems
>> - not relevant in ReservationTemplate, since advance reservations
>> do not count for job accounting
>> - File purging on execution host (demanded by SAGA) was rejected,
>> no overall support in DRM systems
>> - New job template attributes for resource requirements -
>> minPhysMemory, machineOS, machineArch, candidateMachines
>> - candidateMachines semantic: use sub-set or all of this hosts for
>> execution, if not possible, reject job
>> - Advance Reservation interfaces
>> - use case for ReservationTemplate::nativeOptions -> SGE demands
>> queue name in advance reservation
>> - state model for reservations rejected for DRMAA
>> - Introduced "AbsoluteTime" abstraction data type in IDL text
>> - Job state model
>> - New StagedIn / StagedOut / Re-Scheduled state in job state model
>> rejected, give hint in the spec to use sub states for this
>> - Going from running to queued is a special case (only for PBS),
>> no new state; in case, emulate intermediate step in the library
>> - JobInfo will not be merged into Job -> information should be
>> consistent, always get one "performance snapshot"
>> - JobInfo becomes a value type, in order to express this more
>> clearly
>> - getting the list of valid contact strings was rejected (not
>> implementable)
>> - Bulk index placeholder support (in the API) will not be extended
>> - Instead, a new placeholder allows to insert the DRM systems bulk
>> index env. variable name into the template (BULK_TASK_ID_VARNAME)
>> - Idea is that applications can assign the variable name to their
>> own environment variable, and perform an "eval" on it later
>> - Peter fighted against the alternative idea: Standardizing which
>> environment variables a DRMAA library must define implicitly
>> - JobInfo: masterMachine and slaveMachines attributes are merged to
>> an ordered string list (allocatedMachines)
>> - Implementation can assign some semantic to the ordering
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> drmaa-wg mailing list
>> drmaa-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/drmaa-wg
> --
> drmaa-wg mailing list
> drmaa-wg at ogf.org
> http://www.ogf.org/mailman/listinfo/drmaa-wg
More information about the drmaa-wg
mailing list