[DRMAA-WG] Conf call minutes - Jan 20th 2009

Piotr Domagalski piotr.domagalski at fedstage.com
Wed Jan 21 05:59:50 CST 2009


On Tue, Jan 20, 2009 at 7:50 PM, Daniel Gruber <D.Gruber at sun.com> wrote:
>   Undetermined - is it a valid job state?
>   -> Yes! Undetermined = Error
>   -> Condor: it is permanent some time
>   -> Need to clarify if this means "don't try again"
>      or "try it again"

But does that mean that undeteimined state will go away and the
function will return an error?

>   Distinction between state "failed" and "terminated"
>   -> "Failed" := user can fix it (through changes on job template for
> example)
>   -> "Terminated" := error the user can't fix

I thought as Terminated as the state the job gets into if it was
drmaa_controll'ed() or possibly deleted locally in DRMS (by admin or
user), but the later may be optional functionality.

>   Why should we support extensible state?
>   -> basically for reporting
>   -> problem: difficult to implement in C

It might be modelled similarly to BES so that there are standard
states that one can additionally inherit from to have more detailed
states. In C it might done in the following way (kind of OOP
programming in C):

typedef struct {
    int standard_state;
} drmaa_state_t;

That would be standardised.  But the implementation might want to
extend it and then it might actually return:

typedef {
   drmaa_state_t super;
   int my_own_specific_state;
} drmaa_sge_state_t;

If the "client" wants to use only standard states, it uses a pointer
to the first structure and thus doesn't see the detailed state (e.g.
general hold state + user/admin hold implementation specific). But
when he knows he's using a specific DRMAA implementation it may cast
the general structure to the impl-specific one. Kind of a hack, but
AFAIR it is C standards compliant. Pointers to these two structures
should be interchangeable, because they point to the same place in
memory.

-- 
Piotr Domagalski
FedStage Systems Ltd.


More information about the drmaa-wg mailing list