[DRMAA-WG] Meeting Minutes - Conference call - Apr 6th - 19:00 UTC

Mariusz Mamoński mamonski at man.poznan.pl
Wed Apr 13 10:59:02 CDT 2011


2011/4/6 Peter Tröger <peter at troeger.eu>:
> Participants: Daniel Gruber, Mariusz Mamonski, Andre Merzcy, Peter Tröger.
> Organizational aspects:
> - Oracle bridge is no longer available for us
> - Skype conference call worked fine, we continue like this
> - Daniel will check for possibilities with Univa
> - If US participants are still missing next week, we will move to a more
> Europe-friendly time slot
> DRMAAv2 Draft 2:
> - Decision to remove last sentence in line 101
> - Boolean UNSET mapping should also be part of the language binding
> - Example from Andre: Struct might map to dictionary, which can just leave
> out keys in case of UNSET
> - Discussion about throwing out IRIX / TRUE64, not accepted since this
> enumeration was already heavily discussed
> - Line 182, add CRAY: rejected, we are not aware of any relevant DRM system
> available on CRAY; its also not an operating system
> - Line 198: Question about POWER, turned out that POWER is a subset of the
> PPC instruction set architecture, so the current solution is fine
> - Section 4.2: Discussion about adding GPU support
> - There are no good standards for GPU instruction set architectures, so
> having abstract GPU type definitions would be hard
> - Current DRM system support is also mostly based on targeting some Linux
> host with specialized resource demand formulations
> - This is solved way better with job categories
> - Line 246: Comparison of wall clock time definitions in several DRM systems
> - Weak agreement of defining it as time in RUNNING state plus time in
> SUSPENDED state (ok for Condor, Grid Engine)
> - Mariusz still tries to find an example were SUSPENDED state is not
> included
found! ;-) Platform LSF. I did the following experiment:

1. submitted job with WALLCLOCK time limit 1 min:

$bsub -W 00:01 sleep 600 # 10 min sleep
Job <114> is submitted to default queue <medium_priority>.
...
jobs get killed while reaching the wallclock time
...

$bjobs -l 114
...
Wed Apr 13 14:56:55: Completed <exit>; TERM_RUNLIMIT: job killed after reaching
                      LSF run time limit.


2. submitted job with WALLCLOCK time limit 1 min:

$ date
Wed Apr 13 14:32:52 BST 2011
$bsub -W 00:01 sleep 600
Job <113> is submitted to default queue <medium_priority>.

$ bstop 113
Job <113> is being stopped

... after some time...

$ date
Wed Apr 13 14:55:16 BST 2011
$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
113     mpiuser USUSP medium_pri x7500       ex-9-0      sleep 600  Apr 13 14:33

$ bresume 113
Job <113> is being resumed

jobs finished immediately  (sleep counts the time when the process was
suspended)

$bjobs -l 113
...
Wed Apr 13 14:33:09: Started on <ex-9-0>, Execution Home </home/mpiuser>, Execu
                     tion CWD </home/mpiuser>;
Wed Apr 13 14:55:35: Done successfully. The CPU time used is 0.0 seconds.

as you can see job was in SUSPEND + RUNNING state > 12 min >
wallclocktime limit = 1min.


> - Final decision next weak,especially if inclusion of SUSPENDED is marked as
> "MAY" or "MUST"
> - Line 249, question by Daniel Katz: Yes, this is a standard feature, e.g.
> for advance reservation support. Add note in the rationale section.
> - Line 272: Remove first sentence, since this violates the "opaque concept"
> statement in the next sentence.
> - Line 277: New proposal by Mariusz - replace "maxWallclockTime" with a
> generic dictionary for queue attributes
> - Would allow to report DRM-specific properties of a queue, in the same
> opaque sense as the queue name
> - Only helpful for portal case, should not be the base for programmatic
> decisions
> - No clear decision, deferred to next week
> The next conference call with Skype will take place in one week (Apr 13th,
> 19:00 UTC)
> Best regards,
> Peter.
>
> Am 04.04.2011 um 00:28 schrieb Peter Tröger:
>
> Dear all,
>
> the next DRMAA conf call is scheduled for Apr 6th, 19:00 UTC.The phone
> conference line is sponsored by Oracle:
>
> Phone number (toll-free from US): +001-866-545-5227
> Access code: 5988285
>
> The conference bridge MAY no longer work (Dan ?), in this case, we will
> organize something based on Skype.
> Preliminary meeting agenda:
>
> 1. Meeting secretary for this meeting?
> 2. Latest updates from the participants
> 3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)
>
> The attachment draft update already incorporates the comments from Andre
> Merzcy and Daniel S. Katz. Thanks for their input !
> Best regards,
> Peter.
>
> <drmaav2_draft2_annotated.pdf>
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>



-- 
Mariusz


More information about the drmaa-wg mailing list