[saga-rg] job states...

Christopher Smith csmith at platform.com
Fri Feb 10 20:47:45 CST 2006


Sure.

As mentioned ... I think maybe supporting a subset of BES is ok. Much of the
state model wrt file transfer state modelling I think is not required for
SAGA.

-- Chris



On 10/2/06 18:46, "Andre Merzky" <andre at merzky.net> wrote:

> Ok, then I'll do that in the strawman.  I would appreciate
> if you could glance over it after commit, for a sanity
> check.
> 
> Thanks, Andre.
> 
> 
> Quoting [Christopher Smith] (Feb 11 2006):
>> 
>> It makes sense to keep the state models in sync.
>> 
>> -- Chris
>> 
>> 
>> On 10/2/06 18:26, "Andre Merzky" <andre at merzky.net> wrote:
>> 
>>> Quoting [Christopher Smith] (Feb 11 2006):
>>>> 
>>>> What I meant by that comment is that where it is a subset, it should
>>>> reflect
>>>> the BES terminology. I think that the number of states represented is
>>>> enough
>>>> already. ;-)
>>> 
>>> Would it make sense to just copy the BES state diagram?
>>> 
>>> It did not exist when we (== you ;-) drafted the SAGA job
>>> states - if it would have been around then, we might have
>>> had copied it already.
>>> 
>>> Apart from the SystemXXX/UserXXX states, and from Hold,
>>> it is not that much different from the SAGA model anyway.
>>> 
>>> Cheers, Andre.
>>> 
>>> 
>>>> -- Chris
>>>> 
>>>> 
>>>> On 10/2/06 17:30, "Andre Merzky" <andre at merzky.net> wrote:
>>>> 
>>>>> Hi Chris, 
>>>>> 
>>>>> many thanks for the answers! :-)
>>>>> 
>>>>>> By the way ... I believe that the state diagram should at least be a
>>>>>> subset
>>>>>> of the BES state diagram ... we should adopt the same names.
>>>>> 
>>>>> I agree, kind of - I would say that the SAGA job state
>>>>> diagram should at _most_ be subset of the BES state diagram.
>>>>> It could be _S_implier :-)
>>>>> 
>>>>> Cheers, Andre.
>>>>> 
>>>>> 
>>>>> Quoting [Christopher Smith] (Feb 10 2006):
>>>>>> Date: Fri, 10 Feb 2006 13:41:18 -0800
>>>>>> Subject: Re: [saga-rg] job states...
>>>>>> From: Christopher Smith <csmith at platform.com>
>>>>>> To: Simple API for Grid Applications WG <saga-rg at ggf.org>
>>>>>> 
>>>>>> On 4/2/06 11:18, "Andre Merzky" <andre at merzky.net> wrote:
>>>>>> 
>>>>>> Ok ... I'll try to answer these, at least from my viewpoint.
>>>>>> 
>>>>>>> 
>>>>>>> I think that diagram is wrong, isn't it?  Well, here are my
>>>>>>> questions:
>>>>>>> 
>>>>>>>   - if we submit a job, its immediately Queued - is that
>>>>>>>     right?  Should it be pending before (e.g. as long as the
>>>>>>>     queuing request travels the middleware layers)?
>>>>>>> 
>>>>>> To me, Queued is the same as Pending. Pending is probably a better word
>>>>>> for
>>>>>> this. Can't remember where the Queued name came from, as LSF uses PEND.
>>>>>> 
>>>>>>>   - can the hold and suspend states reached only from
>>>>>>>     'Running', or from elsewhere as well?
>>>>>>> 
>>>>>> You can only go into a Hold state from Pending, I think, or directly into
>>>>>> Hold on submission.
>>>>>> 
>>>>>>>   - What is the difference between 'Hold' and 'Suspend'?
>>>>>>> 
>>>>>> A Hold state tells the scheduler/broker not to consider this job for
>>>>>> scheduling/dispatch until the hold is explicitly released.
>>>>>> 
>>>>>>>   - Are there signals defined (apart from KILL) which shange
>>>>>>>     the job state?  I guess that is not as simple as saying
>>>>>>>     SUSP does suspend - that state is probably defined by
>>>>>>>     the scheduler, not by the OS...
>>>>>>> 
>>>>>> Right ... this is implementation dependent on the mechanism used to
>>>>>> suspend
>>>>>> a job (might be a signal, might be some other mechanism). What is
>>>>>> important
>>>>>> is that there is an operation to initiate the state transition.
>>>>>> 
>>>>>>>   - What is the use case for distinguishing between UserHold
>>>>>>>     and SystemHold, or between UserSuspend and
>>>>>>>     SystemSuspend?
>>>>>>>    
>>>>>> If I preempt workload, the system will put it into a SystemSuspend state
>>>>>> that a user cannot cause a switch out of, otherwise a system may become
>>>>>> oversubscribed due to the preempted and preempting jobs running at the
>>>>>> same
>>>>>> time. A UserSuspend can be entered and exited by the user, and is often
>>>>>> used
>>>>>> to hold processing to check progress, etc.
>>>>>>  
>>>>>> 
>>>>>> By the way ... I believe that the state diagram should at least be a
>>>>>> subset
>>>>>> of the BES state diagram ... we should adopt the same names.
>>>>>> 
>>>>>> -- Chris
>>>>> 
>>>>> 
>>> 
>>> 
> 
> 





More information about the saga-rg mailing list