[DRMAA-WG] Meeting Minutes - Conference call - Apr 6th - 19:00UTC

Thijs Metsch tmetsch at platform.com
Thu Apr 14 02:59:00 CDT 2011


Only read this with one eye - but this isn't a bug in LSF or anything...You should use anything different then the sleep command to test this...

Depending on the implementation sleep cannot really be suspended. Most implementations - my guess at least - will calculate an wakeup time and block the queue (OS level) as long as that time isn't over - it's not really a while loop spinning :-)

-Thijs


-----Original Message-----
From: drmaa-wg-bounces at ogf.org on behalf of Daniel Gruber
Sent: Wed 13.04.2011 19:58
To: Mariusz Mamonski
Cc: drmaa-wg at ogf.org
Subject: Re: [DRMAA-WG] Meeting Minutes - Conference call - Apr 6th - 19:00UTC
 
Interesting case Mariusz. It could be a LSF bug or an implementation 
difficulty (maybe they don't check suspended jobs for limits, because
they do not need resources). It would be clearer if you could construct
a case where the the job has a runtime of N seconds. After starting 
it should be suspended immediately then after N seconds it should
be unsuspended. Now when the job resumes the question is if it 
is running another N seconds or will it be deleted immediately. 
Taking the sleep binary itself could be also problematic since AFAIK 
it sets a timer and suspends itself.

Cheers,

Daniel


Am 13.04.2011 um 17:59 schrieb Mariusz Mamonski:

> 2011/4/6 Peter Tröger <peter at troeger.eu>:
>> Participants: Daniel Gruber, Mariusz Mamonski, Andre Merzcy, Peter Tröger.
>> Organizational aspects:
>> - Oracle bridge is no longer available for us
>> - Skype conference call worked fine, we continue like this
>> - Daniel will check for possibilities with Univa
>> - If US participants are still missing next week, we will move to a more
>> Europe-friendly time slot
>> DRMAAv2 Draft 2:
>> - Decision to remove last sentence in line 101
>> - Boolean UNSET mapping should also be part of the language binding
>> - Example from Andre: Struct might map to dictionary, which can just leave
>> out keys in case of UNSET
>> - Discussion about throwing out IRIX / TRUE64, not accepted since this
>> enumeration was already heavily discussed
>> - Line 182, add CRAY: rejected, we are not aware of any relevant DRM system
>> available on CRAY; its also not an operating system
>> - Line 198: Question about POWER, turned out that POWER is a subset of the
>> PPC instruction set architecture, so the current solution is fine
>> - Section 4.2: Discussion about adding GPU support
>> - There are no good standards for GPU instruction set architectures, so
>> having abstract GPU type definitions would be hard
>> - Current DRM system support is also mostly based on targeting some Linux
>> host with specialized resource demand formulations
>> - This is solved way better with job categories
>> - Line 246: Comparison of wall clock time definitions in several DRM systems
>> - Weak agreement of defining it as time in RUNNING state plus time in
>> SUSPENDED state (ok for Condor, Grid Engine)
>> - Mariusz still tries to find an example were SUSPENDED state is not
>> included
> found! ;-) Platform LSF. I did the following experiment:
> 
> 1. submitted job with WALLCLOCK time limit 1 min:
> 
> $bsub -W 00:01 sleep 600 # 10 min sleep
> Job <114> is submitted to default queue <medium_priority>.
> ...
> jobs get killed while reaching the wallclock time
> ...
> 
> $bjobs -l 114
> ...
> Wed Apr 13 14:56:55: Completed <exit>; TERM_RUNLIMIT: job killed after reaching
>                      LSF run time limit.
> 
> 
> 2. submitted job with WALLCLOCK time limit 1 min:
> 
> $ date
> Wed Apr 13 14:32:52 BST 2011
> $bsub -W 00:01 sleep 600
> Job <113> is submitted to default queue <medium_priority>.
> 
> $ bstop 113
> Job <113> is being stopped
> 
> ... after some time...
> 
> $ date
> Wed Apr 13 14:55:16 BST 2011
> $ bjobs
> JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
> 113     mpiuser USUSP medium_pri x7500       ex-9-0      sleep 600  Apr 13 14:33
> 
> $ bresume 113
> Job <113> is being resumed
> 
> jobs finished immediately  (sleep counts the time when the process was
> suspended)
> 
> $bjobs -l 113
> ...
> Wed Apr 13 14:33:09: Started on <ex-9-0>, Execution Home </home/mpiuser>, Execu
>                     tion CWD </home/mpiuser>;
> Wed Apr 13 14:55:35: Done successfully. The CPU time used is 0.0 seconds.
> 
> as you can see job was in SUSPEND + RUNNING state > 12 min >
> wallclocktime limit = 1min.
> 
> 
>> - Final decision next weak,especially if inclusion of SUSPENDED is marked as
>> "MAY" or "MUST"
>> - Line 249, question by Daniel Katz: Yes, this is a standard feature, e.g.
>> for advance reservation support. Add note in the rationale section.
>> - Line 272: Remove first sentence, since this violates the "opaque concept"
>> statement in the next sentence.
>> - Line 277: New proposal by Mariusz - replace "maxWallclockTime" with a
>> generic dictionary for queue attributes
>> - Would allow to report DRM-specific properties of a queue, in the same
>> opaque sense as the queue name
>> - Only helpful for portal case, should not be the base for programmatic
>> decisions
>> - No clear decision, deferred to next week
>> The next conference call with Skype will take place in one week (Apr 13th,
>> 19:00 UTC)
>> Best regards,
>> Peter.
>> 
>> Am 04.04.2011 um 00:28 schrieb Peter Tröger:
>> 
>> Dear all,
>> 
>> the next DRMAA conf call is scheduled for Apr 6th, 19:00 UTC.The phone
>> conference line is sponsored by Oracle:
>> 
>> Phone number (toll-free from US): +001-866-545-5227
>> Access code: 5988285
>> 
>> The conference bridge MAY no longer work (Dan ?), in this case, we will
>> organize something based on Skype.
>> Preliminary meeting agenda:
>> 
>> 1. Meeting secretary for this meeting?
>> 2. Latest updates from the participants
>> 3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)
>> 
>> The attachment draft update already incorporates the comments from Andre
>> Merzcy and Daniel S. Katz. Thanks for their input !
>> Best regards,
>> Peter.
>> 
>> <drmaav2_draft2_annotated.pdf>
>> --
>>  drmaa-wg mailing list
>>  drmaa-wg at ogf.org
>>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>> 
>> --
>>  drmaa-wg mailing list
>>  drmaa-wg at ogf.org
>>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>> 
> 
> 
> 
> -- 
> Mariusz
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg



---------------------------------------------------------------------


Notice from Univa Postmaster:


This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.



---------------------------------------------------------------------

--
  drmaa-wg mailing list
  drmaa-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/drmaa-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110414/84f463d3/attachment.html 


More information about the drmaa-wg mailing list