[saga-rg] job states...

Thilo Kielmann kielmann at cs.vu.nl
Sun Feb 12 01:59:14 CST 2006


> No, they are different, unfortunately.  The DRMAA states are
> closer to the original SAGA states.
> 
> However, DRMAA also spece'd their states before BES, so that
> is not surprising.  It would be interesting though why BES
> came up with a new model at all, or if they new about the
> DRMAA model.

I wouldn't conclude from these facts that BES did the right thing.
Maybe their state diagram is the best around, but by merely being
incompatible with prior work from GGF (like DRMAA) it has its own
drawbacks.

Mandating SAGA uses BES states is too simplistic, IMHO. I am afraid
we need both groups to talk to each other and to us to resolve this.

Maybe we have a chance during this GGF Meeting?


Thilo

> 
> Cheers, Andre.
> 
> 
> Quoting [Thilo Kielmann] (Feb 12 2006):
> > Date: Sun, 12 Feb 2006 08:08:58 +0100
> > From: Thilo Kielmann <kielmann at cs.vu.nl>
> > To: Andre Merzky <andre at merzky.net>
> > Cc: Christopher Smith <csmith at platform.com>,
> > 	Simple API for Grid Applications WG <saga-rg at ggf.org>
> > Subject: Re: [saga-rg] job states...
> > 
> > Curious question:
> > 
> > SAGA should align job states with both BES and DRMAA ?
> > Are they the same to begin with?
> > 
> > Thilo
> > 
> > On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
> > > X-Original-To: kielmann at localhost
> > > Delivered-To: kielmann at localhost.cs.vu.nl
> > > Delivered-To: grdfm-saga-rg-outgoing at mailbouncer.mcs.anl.gov
> > > X-Original-To: grdfm-saga-rg at mailbouncer.mcs.anl.gov
> > > Delivered-To: grdfm-saga-rg at mailbouncer.mcs.anl.gov
> > > Date: Sat, 11 Feb 2006 03:54:16 +0100
> > > From: Andre Merzky <andre at merzky.net>
> > > To: Christopher Smith <csmith at platform.com>
> > > Cc: Andre Merzky <andre at merzky.net>,
> > > 	Simple API for Grid Applications WG <saga-rg at ggf.org>
> > > Subject: Re: [saga-rg] job states...
> > > X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov
> > > X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov
> > > 
> > > I agree - the file transfer state models are needed for SAGA.
> > > We don't have any actions on these states anyway.
> > > 
> > > Andre.
> > > 
> > > 
> > > Quoting [Christopher Smith] (Feb 11 2006):
> > > > 
> > > > Sure.
> > > > 
> > > > As mentioned ... I think maybe supporting a subset of BES is ok. Much of the
> > > > state model wrt file transfer state modelling I think is not required for
> > > > SAGA.
> > > > 
> > > > -- Chris
> > > > 
> > > > 
> > > > 
> > > > On 10/2/06 18:46, "Andre Merzky" <andre at merzky.net> wrote:
> > > > 
> > > > > Ok, then I'll do that in the strawman.  I would appreciate
> > > > > if you could glance over it after commit, for a sanity
> > > > > check.
> > > > > 
> > > > > Thanks, Andre.
> > > > > 
> > > > > 
> > > > > Quoting [Christopher Smith] (Feb 11 2006):
> > > > >> 
> > > > >> It makes sense to keep the state models in sync.
> > > > >> 
> > > > >> -- Chris
> > > > >> 
> > > > >> 
> > > > >> On 10/2/06 18:26, "Andre Merzky" <andre at merzky.net> wrote:
> > > > >> 
> > > > >>> Quoting [Christopher Smith] (Feb 11 2006):
> > > > >>>> 
> > > > >>>> What I meant by that comment is that where it is a subset, it should
> > > > >>>> reflect
> > > > >>>> the BES terminology. I think that the number of states represented is
> > > > >>>> enough
> > > > >>>> already. ;-)
> > > > >>> 
> > > > >>> Would it make sense to just copy the BES state diagram?
> > > > >>> 
> > > > >>> It did not exist when we (== you ;-) drafted the SAGA job
> > > > >>> states - if it would have been around then, we might have
> > > > >>> had copied it already.
> > > > >>> 
> > > > >>> Apart from the SystemXXX/UserXXX states, and from Hold,
> > > > >>> it is not that much different from the SAGA model anyway.
> > > > >>> 
> > > > >>> Cheers, Andre.
> > > > >>> 
> > > > >>> 
> > > > >>>> -- Chris
> > > > >>>> 
> > > > >>>> 
> > > > >>>> On 10/2/06 17:30, "Andre Merzky" <andre at merzky.net> wrote:
> > > > >>>> 
> > > > >>>>> Hi Chris, 
> > > > >>>>> 
> > > > >>>>> many thanks for the answers! :-)
> > > > >>>>> 
> > > > >>>>>> By the way ... I believe that the state diagram should at least be a
> > > > >>>>>> subset
> > > > >>>>>> of the BES state diagram ... we should adopt the same names.
> > > > >>>>> 
> > > > >>>>> I agree, kind of - I would say that the SAGA job state
> > > > >>>>> diagram should at _most_ be subset of the BES state diagram.
> > > > >>>>> It could be _S_implier :-)
> > > > >>>>> 
> > > > >>>>> Cheers, Andre.
> > > > >>>>> 
> > > > >>>>> 
> > > > >>>>> Quoting [Christopher Smith] (Feb 10 2006):
> > > > >>>>>> Date: Fri, 10 Feb 2006 13:41:18 -0800
> > > > >>>>>> Subject: Re: [saga-rg] job states...
> > > > >>>>>> From: Christopher Smith <csmith at platform.com>
> > > > >>>>>> To: Simple API for Grid Applications WG <saga-rg at ggf.org>
> > > > >>>>>> 
> > > > >>>>>> On 4/2/06 11:18, "Andre Merzky" <andre at merzky.net> wrote:
> > > > >>>>>> 
> > > > >>>>>> Ok ... I'll try to answer these, at least from my viewpoint.
> > > > >>>>>> 
> > > > >>>>>>> 
> > > > >>>>>>> I think that diagram is wrong, isn't it?  Well, here are my
> > > > >>>>>>> questions:
> > > > >>>>>>> 
> > > > >>>>>>>   - if we submit a job, its immediately Queued - is that
> > > > >>>>>>>     right?  Should it be pending before (e.g. as long as the
> > > > >>>>>>>     queuing request travels the middleware layers)?
> > > > >>>>>>> 
> > > > >>>>>> To me, Queued is the same as Pending. Pending is probably a better word
> > > > >>>>>> for
> > > > >>>>>> this. Can't remember where the Queued name came from, as LSF uses PEND.
> > > > >>>>>> 
> > > > >>>>>>>   - can the hold and suspend states reached only from
> > > > >>>>>>>     'Running', or from elsewhere as well?
> > > > >>>>>>> 
> > > > >>>>>> You can only go into a Hold state from Pending, I think, or directly into
> > > > >>>>>> Hold on submission.
> > > > >>>>>> 
> > > > >>>>>>>   - What is the difference between 'Hold' and 'Suspend'?
> > > > >>>>>>> 
> > > > >>>>>> A Hold state tells the scheduler/broker not to consider this job for
> > > > >>>>>> scheduling/dispatch until the hold is explicitly released.
> > > > >>>>>> 
> > > > >>>>>>>   - Are there signals defined (apart from KILL) which shange
> > > > >>>>>>>     the job state?  I guess that is not as simple as saying
> > > > >>>>>>>     SUSP does suspend - that state is probably defined by
> > > > >>>>>>>     the scheduler, not by the OS...
> > > > >>>>>>> 
> > > > >>>>>> Right ... this is implementation dependent on the mechanism used to
> > > > >>>>>> suspend
> > > > >>>>>> a job (might be a signal, might be some other mechanism). What is
> > > > >>>>>> important
> > > > >>>>>> is that there is an operation to initiate the state transition.
> > > > >>>>>> 
> > > > >>>>>>>   - What is the use case for distinguishing between UserHold
> > > > >>>>>>>     and SystemHold, or between UserSuspend and
> > > > >>>>>>>     SystemSuspend?
> > > > >>>>>>>    
> > > > >>>>>> If I preempt workload, the system will put it into a SystemSuspend state
> > > > >>>>>> that a user cannot cause a switch out of, otherwise a system may become
> > > > >>>>>> oversubscribed due to the preempted and preempting jobs running at the
> > > > >>>>>> same
> > > > >>>>>> time. A UserSuspend can be entered and exited by the user, and is often
> > > > >>>>>> used
> > > > >>>>>> to hold processing to check progress, etc.
> > > > >>>>>>  
> > > > >>>>>> 
> > > > >>>>>> By the way ... I believe that the state diagram should at least be a
> > > > >>>>>> subset
> > > > >>>>>> of the BES state diagram ... we should adopt the same names.
> > > > >>>>>> 
> > > > >>>>>> -- Chris
> > > > >>>>> 
> > > > >>>>> 
> > > > >>> 
> > > > >>> 
> > > > > 
> > > > > 
> > > 
> > > 
> > > 
> > > -- 
> > > "So much time, so little to do..."  -- Garfield
> -- 
> "So much time, so little to do..."  -- Garfield



-- 
Thilo Kielmann                                 http://www.cs.vu.nl/~kielmann/





More information about the saga-rg mailing list