[DRMAA-WG] Conf call minutes - Jan 20th 2009

Daniel Templeton Dan.Templeton at Sun.COM
Thu Jan 22 09:33:31 CST 2009



Piotr Domagalski wrote:
> On Tue, Jan 20, 2009 at 7:50 PM, Daniel Gruber <D.Gruber at sun.com> wrote:
>   
>>   Undetermined - is it a valid job state?
>>   -> Yes! Undetermined = Error
>>   -> Condor: it is permanent some time
>>   -> Need to clarify if this means "don't try again"
>>      or "try it again"
>>     
>
> But does that mean that undeteimined state will go away and the
> function will return an error?
>   

It probably means that UNDETERMINED becomes one state, and an exception 
becomes the other.

>   
>>   Distinction between state "failed" and "terminated"
>>   -> "Failed" := user can fix it (through changes on job template for
>> example)
>>   -> "Terminated" := error the user can't fix
>>     
>
> I thought as Terminated as the state the job gets into if it was
> drmaa_controll'ed() or possibly deleted locally in DRMS (by admin or
> user), but the later may be optional functionality.
>   

Exactly.  A failed job needs to be fixed before being resubmitted.  A 
terminated job could succeed as-is if resubmitted.

>   
>>   Why should we support extensible state?
>>   -> basically for reporting
>>   -> problem: difficult to implement in C
>>     
>
> It might be modelled similarly to BES so that there are standard
> states that one can additionally inherit from to have more detailed
> states. In C it might done in the following way (kind of OOP
> programming in C):
>
> typedef struct {
>     int standard_state;
> } drmaa_state_t;
>
> That would be standardised.  But the implementation might want to
> extend it and then it might actually return:
>
> typedef {
>    drmaa_state_t super;
>    int my_own_specific_state;
> } drmaa_sge_state_t;
>
> If the "client" wants to use only standard states, it uses a pointer
> to the first structure and thus doesn't see the detailed state (e.g.
> general hold state + user/admin hold implementation specific). But
> when he knows he's using a specific DRMAA implementation it may cast
> the general structure to the impl-specific one. Kind of a hack, but
> AFAIR it is C standards compliant. Pointers to these two structures
> should be interchangeable, because they point to the same place in
> memory.
>   

I was thinking about something more along the lines of:

typedef struct {
    int state;
    void *substate;
} drmaa_state_t;


There is then no confusion for the caller about what he gets back.  He 
only needs to check is the substate is non-null *if* he knows enough 
about the DRMAA implementation to be able to understand it.  It also 
leaves the substate implementation open for the implementation to 
decide.  Maybe an int just isn't enough data.  Or maybe the substate is 
a string message.


Daniel


More information about the drmaa-wg mailing list