[saga-rg] job states...

Andre Merzky andre at merzky.net
Sun Feb 12 02:08:41 CST 2006


Where did those attachements go *scratch*

A.

Quoting [Andre Merzky] (Feb 12 2006):
> Date: Sun, 12 Feb 2006 10:06:01 +0200
> From: Andre Merzky <andre at merzky.net>
> To: Thilo Kielmann <kielmann at cs.vu.nl>
> Cc: Andre Merzky <andre at merzky.net>,
> 	Christopher Smith <csmith at platform.com>,
> 	Simple API for Grid Applications WG <saga-rg at ggf.org>
> Subject: Re: [saga-rg] job states...
> 
> Quoting [Thilo Kielmann] (Feb 12 2006):
> > 
> > > No, they are different, unfortunately.  The DRMAA states are
> > > closer to the original SAGA states.
> > > 
> > > However, DRMAA also spece'd their states before BES, so that
> > > is not surprising.  It would be interesting though why BES
> > > came up with a new model at all, or if they new about the
> > > DRMAA model.
> > 
> > I wouldn't conclude from these facts that BES did the right thing.
> 
> No, definitely not!
> 
> But, my (very personal) opinion is that the BES version is
> simplier, and easier to understand, although it has more
> states!  (I attach both diagrams).
> 
> > Maybe their state diagram is the best around, but by merely being
> > incompatible with prior work from GGF (like DRMAA) it has its own
> > drawbacks.
> 
> Agree, but see note from Steven.
> 
> 
> > Mandating SAGA uses BES states is too simplistic, IMHO. I
> > am afraid we need both groups to talk to each other and to
> > us to resolve this.
> 
> Maybe.  Some assurance to a 'final' GGF version would be
> nice...
> 
> 
> > Maybe we have a chance during this GGF Meeting?
> 
> Hmm, why not :-P
> 
> Cheers, Andre.
> 
> 
> > Thilo
> > 
> > > 
> > > Cheers, Andre.
> > > 
> > > 
> > > Quoting [Thilo Kielmann] (Feb 12 2006):
> > > > Date: Sun, 12 Feb 2006 08:08:58 +0100
> > > > From: Thilo Kielmann <kielmann at cs.vu.nl>
> > > > To: Andre Merzky <andre at merzky.net>
> > > > Cc: Christopher Smith <csmith at platform.com>,
> > > > 	Simple API for Grid Applications WG <saga-rg at ggf.org>
> > > > Subject: Re: [saga-rg] job states...
> > > > 
> > > > Curious question:
> > > > 
> > > > SAGA should align job states with both BES and DRMAA ?
> > > > Are they the same to begin with?
> > > > 
> > > > Thilo
> > > > 
> > > > On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
> > > > > X-Original-To: kielmann at localhost
> > > > > Delivered-To: kielmann at localhost.cs.vu.nl
> > > > > Delivered-To: grdfm-saga-rg-outgoing at mailbouncer.mcs.anl.gov
> > > > > X-Original-To: grdfm-saga-rg at mailbouncer.mcs.anl.gov
> > > > > Delivered-To: grdfm-saga-rg at mailbouncer.mcs.anl.gov
> > > > > Date: Sat, 11 Feb 2006 03:54:16 +0100
> > > > > From: Andre Merzky <andre at merzky.net>
> > > > > To: Christopher Smith <csmith at platform.com>
> > > > > Cc: Andre Merzky <andre at merzky.net>,
> > > > > 	Simple API for Grid Applications WG <saga-rg at ggf.org>
> > > > > Subject: Re: [saga-rg] job states...
> > > > > X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov
> > > > > X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailbouncer.mcs.anl.gov
> > > > > 
> > > > > I agree - the file transfer state models are needed for SAGA.
> > > > > We don't have any actions on these states anyway.
> > > > > 
> > > > > Andre.
> > > > > 
> > > > > 
> > > > > Quoting [Christopher Smith] (Feb 11 2006):
> > > > > > 
> > > > > > Sure.
> > > > > > 
> > > > > > As mentioned ... I think maybe supporting a subset of BES is ok. Much of the
> > > > > > state model wrt file transfer state modelling I think is not required for
> > > > > > SAGA.
> > > > > > 
> > > > > > -- Chris
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 10/2/06 18:46, "Andre Merzky" <andre at merzky.net> wrote:
> > > > > > 
> > > > > > > Ok, then I'll do that in the strawman.  I would appreciate
> > > > > > > if you could glance over it after commit, for a sanity
> > > > > > > check.
> > > > > > > 
> > > > > > > Thanks, Andre.
> > > > > > > 
> > > > > > > 
> > > > > > > Quoting [Christopher Smith] (Feb 11 2006):
> > > > > > >> 
> > > > > > >> It makes sense to keep the state models in sync.
> > > > > > >> 
> > > > > > >> -- Chris
> > > > > > >> 
> > > > > > >> 
> > > > > > >> On 10/2/06 18:26, "Andre Merzky" <andre at merzky.net> wrote:
> > > > > > >> 
> > > > > > >>> Quoting [Christopher Smith] (Feb 11 2006):
> > > > > > >>>> 
> > > > > > >>>> What I meant by that comment is that where it is a subset, it should
> > > > > > >>>> reflect
> > > > > > >>>> the BES terminology. I think that the number of states represented is
> > > > > > >>>> enough
> > > > > > >>>> already. ;-)
> > > > > > >>> 
> > > > > > >>> Would it make sense to just copy the BES state diagram?
> > > > > > >>> 
> > > > > > >>> It did not exist when we (== you ;-) drafted the SAGA job
> > > > > > >>> states - if it would have been around then, we might have
> > > > > > >>> had copied it already.
> > > > > > >>> 
> > > > > > >>> Apart from the SystemXXX/UserXXX states, and from Hold,
> > > > > > >>> it is not that much different from the SAGA model anyway.
> > > > > > >>> 
> > > > > > >>> Cheers, Andre.
> > > > > > >>> 
> > > > > > >>> 
> > > > > > >>>> -- Chris
> > > > > > >>>> 
> > > > > > >>>> 
> > > > > > >>>> On 10/2/06 17:30, "Andre Merzky" <andre at merzky.net> wrote:
> > > > > > >>>> 
> > > > > > >>>>> Hi Chris, 
> > > > > > >>>>> 
> > > > > > >>>>> many thanks for the answers! :-)
> > > > > > >>>>> 
> > > > > > >>>>>> By the way ... I believe that the state diagram should at least be a
> > > > > > >>>>>> subset
> > > > > > >>>>>> of the BES state diagram ... we should adopt the same names.
> > > > > > >>>>> 
> > > > > > >>>>> I agree, kind of - I would say that the SAGA job state
> > > > > > >>>>> diagram should at _most_ be subset of the BES state diagram.
> > > > > > >>>>> It could be _S_implier :-)
> > > > > > >>>>> 
> > > > > > >>>>> Cheers, Andre.
> > > > > > >>>>> 
> > > > > > >>>>> 
> > > > > > >>>>> Quoting [Christopher Smith] (Feb 10 2006):
> > > > > > >>>>>> Date: Fri, 10 Feb 2006 13:41:18 -0800
> > > > > > >>>>>> Subject: Re: [saga-rg] job states...
> > > > > > >>>>>> From: Christopher Smith <csmith at platform.com>
> > > > > > >>>>>> To: Simple API for Grid Applications WG <saga-rg at ggf.org>
> > > > > > >>>>>> 
> > > > > > >>>>>> On 4/2/06 11:18, "Andre Merzky" <andre at merzky.net> wrote:
> > > > > > >>>>>> 
> > > > > > >>>>>> Ok ... I'll try to answer these, at least from my viewpoint.
> > > > > > >>>>>> 
> > > > > > >>>>>>> 
> > > > > > >>>>>>> I think that diagram is wrong, isn't it?  Well, here are my
> > > > > > >>>>>>> questions:
> > > > > > >>>>>>> 
> > > > > > >>>>>>>   - if we submit a job, its immediately Queued - is that
> > > > > > >>>>>>>     right?  Should it be pending before (e.g. as long as the
> > > > > > >>>>>>>     queuing request travels the middleware layers)?
> > > > > > >>>>>>> 
> > > > > > >>>>>> To me, Queued is the same as Pending. Pending is probably a better word
> > > > > > >>>>>> for
> > > > > > >>>>>> this. Can't remember where the Queued name came from, as LSF uses PEND.
> > > > > > >>>>>> 
> > > > > > >>>>>>>   - can the hold and suspend states reached only from
> > > > > > >>>>>>>     'Running', or from elsewhere as well?
> > > > > > >>>>>>> 
> > > > > > >>>>>> You can only go into a Hold state from Pending, I think, or directly into
> > > > > > >>>>>> Hold on submission.
> > > > > > >>>>>> 
> > > > > > >>>>>>>   - What is the difference between 'Hold' and 'Suspend'?
> > > > > > >>>>>>> 
> > > > > > >>>>>> A Hold state tells the scheduler/broker not to consider this job for
> > > > > > >>>>>> scheduling/dispatch until the hold is explicitly released.
> > > > > > >>>>>> 
> > > > > > >>>>>>>   - Are there signals defined (apart from KILL) which shange
> > > > > > >>>>>>>     the job state?  I guess that is not as simple as saying
> > > > > > >>>>>>>     SUSP does suspend - that state is probably defined by
> > > > > > >>>>>>>     the scheduler, not by the OS...
> > > > > > >>>>>>> 
> > > > > > >>>>>> Right ... this is implementation dependent on the mechanism used to
> > > > > > >>>>>> suspend
> > > > > > >>>>>> a job (might be a signal, might be some other mechanism). What is
> > > > > > >>>>>> important
> > > > > > >>>>>> is that there is an operation to initiate the state transition.
> > > > > > >>>>>> 
> > > > > > >>>>>>>   - What is the use case for distinguishing between UserHold
> > > > > > >>>>>>>     and SystemHold, or between UserSuspend and
> > > > > > >>>>>>>     SystemSuspend?
> > > > > > >>>>>>>    
> > > > > > >>>>>> If I preempt workload, the system will put it into a SystemSuspend state
> > > > > > >>>>>> that a user cannot cause a switch out of, otherwise a system may become
> > > > > > >>>>>> oversubscribed due to the preempted and preempting jobs running at the
> > > > > > >>>>>> same
> > > > > > >>>>>> time. A UserSuspend can be entered and exited by the user, and is often
> > > > > > >>>>>> used
> > > > > > >>>>>> to hold processing to check progress, etc.
> > > > > > >>>>>>  
> > > > > > >>>>>> 
> > > > > > >>>>>> By the way ... I believe that the state diagram should at least be a
> > > > > > >>>>>> subset
> > > > > > >>>>>> of the BES state diagram ... we should adopt the same names.
> > > > > > >>>>>> 
> > > > > > >>>>>> -- Chris
> > > > > > >>>>> 
> > > > > > >>>>> 
> > > > > > >>> 
> > > > > > >>> 
> > > > > > > 
> > > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > -- 
> > > > > "So much time, so little to do..."  -- Garfield
> > > -- 
> > > "So much time, so little to do..."  -- Garfield
-- 
"So much time, so little to do..."  -- Garfield
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GGF-DRMAA-JobStatus.png
Type: image/png
Size: 43674 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/saga-rg/attachments/20060212/e017f33b/attachment-0006.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: job_states.png
Type: image/png
Size: 20464 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/saga-rg/attachments/20060212/e017f33b/attachment-0007.png 


More information about the saga-rg mailing list