[DRMAA-WG] Conference call -Feb 17th - 17:00 UTC (incl. task list)

Roger Brobst rogerb at cadence.com
Mon Feb 16 09:47:05 CST 2009


> We still have the following opinions under discussion:
> 
> Dan: TERMINATED state = resubmission might help, FAILED state
> resubmission unlikely to help (machine problem, misconfiguration)
> Andre: TERMINATED state = triggered by an external entity, FAILED  
> state = job terminated by itself
> Roger: FAILED state = job command line could not be executed
> Drmaa1: FAILED state = job was running, but did not finish  
> successfully for some reason
> 
> How is the new TERMINATED state related to the wif_ functions ?
> 
> Issue #5875 (originally form the PBS experience report) criticizes  
> that FAILED currently expresses both user-requested termination and  
> job failure. How is this issue solved by the newly chosen approach ?

My concern about being able to distinguish between a
started and failed job vs. a job which was not started
is addressed by the 'jobState == FAILED && wasAborted == TRUE'.
Upon review of the spec, I see that I had forgotten that
'wasAborted' was effectively a substate of FAILED.

> Issue #5875 (originally form the PBS experience report) 
> criticizes that FAILED currently expresses both 
> user-requested termination and job failure.
> How is this issue solved by the newly chosen approach ?

It would seem, in general, the teminatingSignal could be
used to distinguish between a crash and a user-requested
termination (this assumes that signals like SIGSEGV 
are not sent for user-requested terminations).

-Roger

----Original Message----
From: =?ISO-8859-1?Q?Peter_Tr=F6ger?= <peter at troeger.eu>
Sender: drmaa-wg-bounces at ogf.org
To: drmaa-wg at ogf.org
Subject: [DRMAA-WG] Conference call -Feb 17th - 17:00 UTC (incl. task list)
Date: Mon, 16 Feb 2009 10:56:36 +0100

Dear all,

the bi-weekly DRMAA call is scheduled on Feb 17th, 2009. 
The meeting  starts at

17:00 UTC == 18:00 CET (Berlin/Poland time) == 9:00 PST (Vancouver time)

Phone conference line sponsored by Sun:

Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285

Preliminary meeting agenda:

1. Meeting secretary for this meeting?
2. Acceptance of last meeting minutes
3. Voting: sub-state as string vs. sub-state as string to be parsed  
vs. sub-state as integer / enum
4. Job state discussion

We still have the following opinions under discussion:

Dan: TERMINATED state = resubmission might help, FAILED state =  
resubmission unlikely to help (machine problem, misconfiguration)
Andre: TERMINATED state = triggered by an external entity, FAILED  
state = job terminated by itself
Roger: FAILED state = job command line could not be executed
Drmaa1: FAILED state = job was running, but did not finish  
successfully for some reason

How is the new TERMINATED state related to the wif_ functions ?

Issue #5875 (originally form the PBS experience report) criticizes  
that FAILED currently expresses both user-requested termination and  
job failure. How is this issue solved by the newly chosen approach ?

*Please* spend some thoughts until tomorrow, so that we can come to  
some final voting here.

5. Replacement for partial time stamp functionality - please check if  
your DRM / language binding would support our candidate ISO 8601.

Latest DRMAA2 draft document is attached.

Best regards,
Peter.


More information about the drmaa-wg mailing list