[saga-rg] job states...

Thilo Kielmann kielmann at cs.vu.nl
Tue Feb 14 00:53:07 CST 2006


Pardon me, BES is not for Grids???

Thilo

On Mon, Feb 13, 2006 at 10:14:07AM -0800, Rajic, Hrabri wrote:
> 
> They are not.  Grid is a richer environment. 
> 
> Hrabri
> 
> >-----Original Message-----
> >From: owner-saga-rg at ggf.org [mailto:owner-saga-rg at ggf.org] On Behalf Of
> >Thilo Kielmann
> >Sent: Sunday, February 12, 2006 1:09 AM
> >To: Andre Merzky
> >Cc: Christopher Smith; Simple API for Grid Applications WG
> >Subject: Re: [saga-rg] job states...
> >
> >Curious question:
> >
> >SAGA should align job states with both BES and DRMAA ?
> >Are they the same to begin with?
> >
> >Thilo
> >
> >On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
> >> X-Original-To: kielmann at localhost
> >> Delivered-To: kielmann at localhost.cs.vu.nl
> >> Delivered-To: grdfm-saga-rg-outgoing at mailbouncer.mcs.anl.gov
> >> X-Original-To: grdfm-saga-rg at mailbouncer.mcs.anl.gov
> >> Delivered-To: grdfm-saga-rg at mailbouncer.mcs.anl.gov
> >> Date: Sat, 11 Feb 2006 03:54:16 +0100
> >> From: Andre Merzky <andre at merzky.net>
> >> To: Christopher Smith <csmith at platform.com>
> >> Cc: Andre Merzky <andre at merzky.net>,
> >> 	Simple API for Grid Applications WG <saga-rg at ggf.org>
> >> Subject: Re: [saga-rg] job states...
> >> X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at
> >mailbouncer.mcs.anl.gov
> >> X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at
> >mailbouncer.mcs.anl.gov
> >>
> >> I agree - the file transfer state models are needed for SAGA.
> >> We don't have any actions on these states anyway.
> >>
> >> Andre.
> >>
> >>
> >> Quoting [Christopher Smith] (Feb 11 2006):
> >> >
> >> > Sure.
> >> >
> >> > As mentioned ... I think maybe supporting a subset of BES is ok.
> Much
> >of the
> >> > state model wrt file transfer state modelling I think is not
> required
> >for
> >> > SAGA.
> >> >
> >> > -- Chris
> >> >
> >> >
> >> >
> >> > On 10/2/06 18:46, "Andre Merzky" <andre at merzky.net> wrote:
> >> >
> >> > > Ok, then I'll do that in the strawman.  I would appreciate
> >> > > if you could glance over it after commit, for a sanity
> >> > > check.
> >> > >
> >> > > Thanks, Andre.
> >> > >
> >> > >
> >> > > Quoting [Christopher Smith] (Feb 11 2006):
> >> > >>
> >> > >> It makes sense to keep the state models in sync.
> >> > >>
> >> > >> -- Chris
> >> > >>
> >> > >>
> >> > >> On 10/2/06 18:26, "Andre Merzky" <andre at merzky.net> wrote:
> >> > >>
> >> > >>> Quoting [Christopher Smith] (Feb 11 2006):
> >> > >>>>
> >> > >>>> What I meant by that comment is that where it is a subset, it
> >should
> >> > >>>> reflect
> >> > >>>> the BES terminology. I think that the number of states
> represented
> >is
> >> > >>>> enough
> >> > >>>> already. ;-)
> >> > >>>
> >> > >>> Would it make sense to just copy the BES state diagram?
> >> > >>>
> >> > >>> It did not exist when we (== you ;-) drafted the SAGA job
> >> > >>> states - if it would have been around then, we might have
> >> > >>> had copied it already.
> >> > >>>
> >> > >>> Apart from the SystemXXX/UserXXX states, and from Hold,
> >> > >>> it is not that much different from the SAGA model anyway.
> >> > >>>
> >> > >>> Cheers, Andre.
> >> > >>>
> >> > >>>
> >> > >>>> -- Chris
> >> > >>>>
> >> > >>>>
> >> > >>>> On 10/2/06 17:30, "Andre Merzky" <andre at merzky.net> wrote:
> >> > >>>>
> >> > >>>>> Hi Chris,
> >> > >>>>>
> >> > >>>>> many thanks for the answers! :-)
> >> > >>>>>
> >> > >>>>>> By the way ... I believe that the state diagram should at
> least
> >be a
> >> > >>>>>> subset
> >> > >>>>>> of the BES state diagram ... we should adopt the same names.
> >> > >>>>>
> >> > >>>>> I agree, kind of - I would say that the SAGA job state
> >> > >>>>> diagram should at _most_ be subset of the BES state diagram.
> >> > >>>>> It could be _S_implier :-)
> >> > >>>>>
> >> > >>>>> Cheers, Andre.
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Quoting [Christopher Smith] (Feb 10 2006):
> >> > >>>>>> Date: Fri, 10 Feb 2006 13:41:18 -0800
> >> > >>>>>> Subject: Re: [saga-rg] job states...
> >> > >>>>>> From: Christopher Smith <csmith at platform.com>
> >> > >>>>>> To: Simple API for Grid Applications WG <saga-rg at ggf.org>
> >> > >>>>>>
> >> > >>>>>> On 4/2/06 11:18, "Andre Merzky" <andre at merzky.net> wrote:
> >> > >>>>>>
> >> > >>>>>> Ok ... I'll try to answer these, at least from my viewpoint.
> >> > >>>>>>
> >> > >>>>>>>
> >> > >>>>>>> I think that diagram is wrong, isn't it?  Well, here are my
> >> > >>>>>>> questions:
> >> > >>>>>>>
> >> > >>>>>>>   - if we submit a job, its immediately Queued - is that
> >> > >>>>>>>     right?  Should it be pending before (e.g. as long as
> the
> >> > >>>>>>>     queuing request travels the middleware layers)?
> >> > >>>>>>>
> >> > >>>>>> To me, Queued is the same as Pending. Pending is probably a
> >better word
> >> > >>>>>> for
> >> > >>>>>> this. Can't remember where the Queued name came from, as LSF
> >uses PEND.
> >> > >>>>>>
> >> > >>>>>>>   - can the hold and suspend states reached only from
> >> > >>>>>>>     'Running', or from elsewhere as well?
> >> > >>>>>>>
> >> > >>>>>> You can only go into a Hold state from Pending, I think, or
> >directly into
> >> > >>>>>> Hold on submission.
> >> > >>>>>>
> >> > >>>>>>>   - What is the difference between 'Hold' and 'Suspend'?
> >> > >>>>>>>
> >> > >>>>>> A Hold state tells the scheduler/broker not to consider this
> job
> >for
> >> > >>>>>> scheduling/dispatch until the hold is explicitly released.
> >> > >>>>>>
> >> > >>>>>>>   - Are there signals defined (apart from KILL) which
> shange
> >> > >>>>>>>     the job state?  I guess that is not as simple as saying
> >> > >>>>>>>     SUSP does suspend - that state is probably defined by
> >> > >>>>>>>     the scheduler, not by the OS...
> >> > >>>>>>>
> >> > >>>>>> Right ... this is implementation dependent on the mechanism
> used
> >to
> >> > >>>>>> suspend
> >> > >>>>>> a job (might be a signal, might be some other mechanism).
> What
> >is
> >> > >>>>>> important
> >> > >>>>>> is that there is an operation to initiate the state
> transition.
> >> > >>>>>>
> >> > >>>>>>>   - What is the use case for distinguishing between
> UserHold
> >> > >>>>>>>     and SystemHold, or between UserSuspend and
> >> > >>>>>>>     SystemSuspend?
> >> > >>>>>>>
> >> > >>>>>> If I preempt workload, the system will put it into a
> >SystemSuspend state
> >> > >>>>>> that a user cannot cause a switch out of, otherwise a system
> may
> >become
> >> > >>>>>> oversubscribed due to the preempted and preempting jobs
> running
> >at the
> >> > >>>>>> same
> >> > >>>>>> time. A UserSuspend can be entered and exited by the user,
> and
> >is often
> >> > >>>>>> used
> >> > >>>>>> to hold processing to check progress, etc.
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>> By the way ... I believe that the state diagram should at
> least
> >be a
> >> > >>>>>> subset
> >> > >>>>>> of the BES state diagram ... we should adopt the same names.
> >> > >>>>>>
> >> > >>>>>> -- Chris
> >> > >>>>>
> >> > >>>>>
> >> > >>>
> >> > >>>
> >> > >
> >> > >
> >>
> >>
> >>
> >> --
> >> "So much time, so little to do..."  -- Garfield
> >>
> >
> >
> >
> >--
> >Thilo Kielmann
> >http://www.cs.vu.nl/~kielmann/



-- 
Thilo Kielmann                                 http://www.cs.vu.nl/~kielmann/





More information about the saga-rg mailing list