[DRMAA-WG] Conference call - Jun 9th - 19:00 UTC

Mariusz Mamoński mamonski at man.poznan.pl
Wed Jun 9 15:53:21 CDT 2010


the telco minutes:

participants: Dan, Daniel, Roger, Peter, Mariusz

1. Meeting secretary: Mariusz
2. Support for core file fetching
- location of the core files is very OS specific => extremely
difficult to implement on the DRMAA level
- but we can provide portable way of setting core file limit, thus
forcing core dumps on segfaults => why not to provide some way to set
the other resource limits (e.g. memory, cpu time)
- most of the DRMS support limits

the decision:
- add to the JobTemplate a dictionary attribute for setting limits
(both soft and hard). The set of predefined resources that can be
limited must be defined based on what is supported by the DRMS. A
particular DRMAA implementation should support all of the predefined
limits or none of them.
- do not provide explicit support for core file fetching

3. Job execution time metrics
- Dan's understanding of the difference  between "run duration limit"
and the "wall clock  limit":
   * wall clock -> real time (i.e. job_end_time - job_start_time)
   * run duration -> CPU time
- in general exceeding the *soft* limit cause sending a signal which
can be caught where *hard* sends SIGKILL

the decision:
- move "run duration" to the resource limits dictionary attribute (RLIMIT_CPU?)
- keep the wall clock time limit (both soft and hard) as the separate
JobTemplate attributes as it denotes more DRMS specific feature than
the OS one.

4. Monitoring jobs not submitted by DRMAA
- strongly requested by many users
- security is not a concern (it is usually up to the DRMS policy if
one user can monitor the other users jobs)
- not all information may be provided for the jobs submitted outside
of the DRMAA (e.g. exit code)
- it should not be a big burden to provide just list of job IDs in the
MonitoringSession

the decision:
to be decided (but more yes than no)


Cheers,


2010/6/9 Peter Tröger <peter at troeger.eu>:
> Dear all,
>
> the next DRMAA phone conference is scheduled on Jun 9th, at 19:00 UTC.
>
> The phone conference line is sponsored by Oracle. Please consult the
> following page for dial-in numbers from your country:
> http://www.intercall.com/oracle/access_numbers.htm
> The conference code is 6513037.  The security code is DRMAA (37622).
> Preliminary meeting agenda:
> 1. Meeting secretary ?
> 2. Support for core file fetching
> 3. Job execution time metrics
> 4. Monitoring jobs not submitted by DRMAA
> 5. Cleaning up the spreadsheet:
> http://spreadsheets.google.com/ccc?key=rrAIK9utkSoDQXF8kasYCLQ
>
> Best regards,
> Peter.
>
>
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>



-- 
Mariusz


More information about the drmaa-wg mailing list