[DRMAA-WG] Conference call -Feb 17th - 17:00 UTC (incl. task list)
Roger Brobst
rogerb at cadence.com
Mon Feb 16 09:47:05 CST 2009
> We still have the following opinions under discussion:
>
> Dan: TERMINATED state = resubmission might help, FAILED state
> resubmission unlikely to help (machine problem, misconfiguration)
> Andre: TERMINATED state = triggered by an external entity, FAILED
> state = job terminated by itself
> Roger: FAILED state = job command line could not be executed
> Drmaa1: FAILED state = job was running, but did not finish
> successfully for some reason
>
> How is the new TERMINATED state related to the wif_ functions ?
>
> Issue #5875 (originally form the PBS experience report) criticizes
> that FAILED currently expresses both user-requested termination and
> job failure. How is this issue solved by the newly chosen approach ?
My concern about being able to distinguish between a
started and failed job vs. a job which was not started
is addressed by the 'jobState == FAILED && wasAborted == TRUE'.
Upon review of the spec, I see that I had forgotten that
'wasAborted' was effectively a substate of FAILED.
> Issue #5875 (originally form the PBS experience report)
> criticizes that FAILED currently expresses both
> user-requested termination and job failure.
> How is this issue solved by the newly chosen approach ?
It would seem, in general, the teminatingSignal could be
used to distinguish between a crash and a user-requested
termination (this assumes that signals like SIGSEGV
are not sent for user-requested terminations).
-Roger
----Original Message----
From: =?ISO-8859-1?Q?Peter_Tr=F6ger?= <peter at troeger.eu>
Sender: drmaa-wg-bounces at ogf.org
To: drmaa-wg at ogf.org
Subject: [DRMAA-WG] Conference call -Feb 17th - 17:00 UTC (incl. task list)
Date: Mon, 16 Feb 2009 10:56:36 +0100
Dear all,
the bi-weekly DRMAA call is scheduled on Feb 17th, 2009.
The meeting starts at
17:00 UTC == 18:00 CET (Berlin/Poland time) == 9:00 PST (Vancouver time)
Phone conference line sponsored by Sun:
Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285
Preliminary meeting agenda:
1. Meeting secretary for this meeting?
2. Acceptance of last meeting minutes
3. Voting: sub-state as string vs. sub-state as string to be parsed
vs. sub-state as integer / enum
4. Job state discussion
We still have the following opinions under discussion:
Dan: TERMINATED state = resubmission might help, FAILED state =
resubmission unlikely to help (machine problem, misconfiguration)
Andre: TERMINATED state = triggered by an external entity, FAILED
state = job terminated by itself
Roger: FAILED state = job command line could not be executed
Drmaa1: FAILED state = job was running, but did not finish
successfully for some reason
How is the new TERMINATED state related to the wif_ functions ?
Issue #5875 (originally form the PBS experience report) criticizes
that FAILED currently expresses both user-requested termination and
job failure. How is this issue solved by the newly chosen approach ?
*Please* spend some thoughts until tomorrow, so that we can come to
some final voting here.
5. Replacement for partial time stamp functionality - please check if
your DRM / language binding would support our candidate ISO 8601.
Latest DRMAA2 draft document is attached.
Best regards,
Peter.
More information about the drmaa-wg
mailing list