[DRMAA-WG] Conference call - Apr 27th - 19:00 UTC

Peter Tröger peter at troeger.eu
Wed May 11 09:58:14 CDT 2011


Hi,

thanks for your huge contribution with this. Here are my comments. If the OGF.ORG SVN works again, I will update the document sources on the server.  The new PDF follows today.

There are three kinds of reactions I state:

(1) Added. No discussion needed, I just performed the according document modification.
(2) Ignored. I am pretty confident that this was debated and decided well enough in the group, so I am not willing to re-open discussion again. The group is free to disagree.
(3) Obsoleted. Recent document modifications already established the proposal as a fact.

Best regards,
Peter.



> Hi,
> 
> I finally managed to read the current version of spec more carefully.
> Bellow some comments (line numbering corresponds to version annotated
> as "draft3"):
> 
> line 81: DRMAA1 -> DRMAA Version 1 [reference]
> 94,95: A Exec.. -> An Exec
> 159: advanced -> advance
> 296: "Machine structure" - should we include machine state here (e.g.
> down, administratively down, available, busy, ...) ?
> 316:  consistent... -> consistent among all Machine struct instances.

Added.

> Moreover any reported name should be a syntactically correct input for
> the candidateMachines attribute of the JobTemplate. ???

canddateMachines takes MachineList as input.

> 361: any jobSubState - is there really any case where this would a
> complex object? Why just not use string here (Yes i know, in the spec
> there is a requirement that language binding should define conversion
> to String for every object, but this may be complex... ;-)

Ignored, this was already discussed and decided.

> 370: missing \n

Added.

> 377-383: running, buffered, purged -> i think this sections needs to
> be more precisely and verbose. In DRMAA 1.0 the wait call was
> responsible for reaping the jobs. This is important because some DRMS
> do not "buffer" jobs at all (or do it for a very short time) and the
> buffering has to be done in the DRMAA library (for the session's jobs
> only), this implies the question: how long to buffer the job
> information...

Added as ToDo.

> 395: exitStatus - should we state here that the valid exitStatus
> values are 0-125 ?

Ignored, this was already discussed and decided.

> 445: cpuTime - should we state here that it is cumulative time among
> all the job processes? i.e. cpu time can be grater than wall clock
> time for parallel jobs

Added, also for wall clock time.

> 497: maybe we should add "Dictionary consumableResources;" @see Nadev
> e-mail  I also raised this during one of the last telcos...

See meeting minutes.

> 594: "execution host" -> "submission host" ???

Why this ? inputPath and friends relate to files that are used by the running job on the execution host.

> 652: maxSlots should be optional (e.g. Torque do not support range values)

Added as ToDo.

> 657: SHOULD -> MAY - at least until we don't have predefined JobCategories ;-)

Ignored, this was already discussed and decided.

> 785: SessionManagementException - what is the added value of this
> exception? can it be thrown from other operations than
> open/close/destroy Session? If not then why we don't have
> WaitException, RunException? ;-)

Added as ToDo.

> 791: OutOfMemoryException - can we also throw this exception when the
> user supplied buffer was to small?

Added.

> 829: reservationSupported - maybe we can move it now to
> DrmaaReflective interface?

Obsolete.

> 948: FAILED vs DONE - maybe we should be more precisely for situation
> when the job was started but: e.g. exited with exitcode != 0 (i
> believe this should be DONE), was signalled, terminated via DRMAA,

Ignored, this was already discussed and decided.

> 967: REQUEUED, REQUEUED_HELD and BES states. Because BES state model
> prohibits transition between the Running to Pending... so it it should
> be Running state. Also the state names in brackets looks like
> specialization of one of the BES implementations (i will not say which
> implementation ;-) so they are definitively non-normative.

Added. And yes, this is why the table title contains "example"

> 1035: The largest valid value for endIndex MUST be defined by the
> language binding. - there may be also DRMS constraint.

Added.

> 1047: "only one of the active thread..." - is this requirement really
> needed? i'm asking because i'm afraid this would increase complexity
> of the implementations (do you remember the "session any" and its
> coincidence with run job operations?). This may be related to comment
> 377-383.

Ignored, this was already discussed and decided.

> 1063: "DrmaaCallback Interface"....
> 
> I just wonder if the requirement "An implementation SHOULD also
> disallow any library calls while the callback function is running, to
> avoid recursion scenarios. It is	RECOMMENDED to raise
> TryLaterException in this case." is really needed.  If we want to keep
> this requirement is the Job object useful at all as we can only read
> the jobId from it?

Added as ToDo.

> 
> 1109-1110: why those methods returns the Job objects?

Ignored, this was already discussed and decided.

> 
> 1262: footnote 30 (what about symmetry ;-) Also last decision was to
> have separate ReservationInfo struct:
> http://www.mail-archive.com/drmaa-wg@ogf.org/msg00250.html (when it
> was revoked?)

Obsoleted.

> 
> 1508: reservationInfoOpt, reservationInfoImpl - what if one want to
> provide more information about the reservation?, also the symmetry
> rule ;-), relates to 1262

Obsoleted.

>         should we also move the drmsJobCategoryNames here (from
> MonitoringSession)?

No, since DrmaaReflective is only about introspection support for optional / impl. specific attributes. Added ToDo to clarify if this should move to the new generic capability check.




More information about the drmaa-wg mailing list