[saga-rg] job states...

Rajic, Hrabri hrabri.rajic at intel.com
Tue Feb 14 09:40:12 CST 2006


Other way around :-).

DRMAA does not have explicit support for file transfers, so its state
diagram is missing file staging.

The original DRMAA-WG members' intent was to specify something that most
of the DRM systems currently support, possibly a bit more to have a
lower entry implementation bar.  That has eliminated the pre and post
execution stages. Cleaning up and shutting down stages were never
discussed. 

Hrabri


>-----Original Message-----
>From: owner-saga-rg at ggf.org [mailto:owner-saga-rg at ggf.org] On Behalf Of
>Thilo Kielmann
>Sent: Tuesday, February 14, 2006 12:53 AM
>To: Rajic, Hrabri
>Cc: Simple API for Grid Applications WG
>Subject: Re: [saga-rg] job states...
>
>Pardon me, BES is not for Grids???
>
>Thilo
>
>On Mon, Feb 13, 2006 at 10:14:07AM -0800, Rajic, Hrabri wrote:
>>
>> They are not.  Grid is a richer environment.
>>
>> Hrabri
>>
>> >-----Original Message-----
>> >From: owner-saga-rg at ggf.org [mailto:owner-saga-rg at ggf.org] On Behalf
Of
>> >Thilo Kielmann
>> >Sent: Sunday, February 12, 2006 1:09 AM
>> >To: Andre Merzky
>> >Cc: Christopher Smith; Simple API for Grid Applications WG
>> >Subject: Re: [saga-rg] job states...
>> >
>> >Curious question:
>> >
>> >SAGA should align job states with both BES and DRMAA ?
>> >Are they the same to begin with?
>> >
>> >Thilo
>> >
>> >On Sat, Feb 11, 2006 at 03:54:16AM +0100, Andre Merzky wrote:
>> >> X-Original-To: kielmann at localhost
>> >> Delivered-To: kielmann at localhost.cs.vu.nl
>> >> Delivered-To: grdfm-saga-rg-outgoing at mailbouncer.mcs.anl.gov
>> >> X-Original-To: grdfm-saga-rg at mailbouncer.mcs.anl.gov
>> >> Delivered-To: grdfm-saga-rg at mailbouncer.mcs.anl.gov
>> >> Date: Sat, 11 Feb 2006 03:54:16 +0100
>> >> From: Andre Merzky <andre at merzky.net>
>> >> To: Christopher Smith <csmith at platform.com>
>> >> Cc: Andre Merzky <andre at merzky.net>,
>> >> 	Simple API for Grid Applications WG <saga-rg at ggf.org>
>> >> Subject: Re: [saga-rg] job states...
>> >> X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at
>> >mailbouncer.mcs.anl.gov
>> >> X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at
>> >mailbouncer.mcs.anl.gov
>> >>
>> >> I agree - the file transfer state models are needed for SAGA.
>> >> We don't have any actions on these states anyway.
>> >>
>> >> Andre.
>> >>
>> >>
>> >> Quoting [Christopher Smith] (Feb 11 2006):
>> >> >
>> >> > Sure.
>> >> >
>> >> > As mentioned ... I think maybe supporting a subset of BES is ok.
>> Much
>> >of the
>> >> > state model wrt file transfer state modelling I think is not
>> required
>> >for
>> >> > SAGA.
>> >> >
>> >> > -- Chris
>> >> >
>> >> >
>> >> >
>> >> > On 10/2/06 18:46, "Andre Merzky" <andre at merzky.net> wrote:
>> >> >
>> >> > > Ok, then I'll do that in the strawman.  I would appreciate
>> >> > > if you could glance over it after commit, for a sanity
>> >> > > check.
>> >> > >
>> >> > > Thanks, Andre.
>> >> > >
>> >> > >
>> >> > > Quoting [Christopher Smith] (Feb 11 2006):
>> >> > >>
>> >> > >> It makes sense to keep the state models in sync.
>> >> > >>
>> >> > >> -- Chris
>> >> > >>
>> >> > >>
>> >> > >> On 10/2/06 18:26, "Andre Merzky" <andre at merzky.net> wrote:
>> >> > >>
>> >> > >>> Quoting [Christopher Smith] (Feb 11 2006):
>> >> > >>>>
>> >> > >>>> What I meant by that comment is that where it is a subset,
it
>> >should
>> >> > >>>> reflect
>> >> > >>>> the BES terminology. I think that the number of states
>> represented
>> >is
>> >> > >>>> enough
>> >> > >>>> already. ;-)
>> >> > >>>
>> >> > >>> Would it make sense to just copy the BES state diagram?
>> >> > >>>
>> >> > >>> It did not exist when we (== you ;-) drafted the SAGA job
>> >> > >>> states - if it would have been around then, we might have
>> >> > >>> had copied it already.
>> >> > >>>
>> >> > >>> Apart from the SystemXXX/UserXXX states, and from Hold,
>> >> > >>> it is not that much different from the SAGA model anyway.
>> >> > >>>
>> >> > >>> Cheers, Andre.
>> >> > >>>
>> >> > >>>
>> >> > >>>> -- Chris
>> >> > >>>>
>> >> > >>>>
>> >> > >>>> On 10/2/06 17:30, "Andre Merzky" <andre at merzky.net> wrote:
>> >> > >>>>
>> >> > >>>>> Hi Chris,
>> >> > >>>>>
>> >> > >>>>> many thanks for the answers! :-)
>> >> > >>>>>
>> >> > >>>>>> By the way ... I believe that the state diagram should at
>> least
>> >be a
>> >> > >>>>>> subset
>> >> > >>>>>> of the BES state diagram ... we should adopt the same
names.
>> >> > >>>>>
>> >> > >>>>> I agree, kind of - I would say that the SAGA job state
>> >> > >>>>> diagram should at _most_ be subset of the BES state
diagram.
>> >> > >>>>> It could be _S_implier :-)
>> >> > >>>>>
>> >> > >>>>> Cheers, Andre.
>> >> > >>>>>
>> >> > >>>>>
>> >> > >>>>> Quoting [Christopher Smith] (Feb 10 2006):
>> >> > >>>>>> Date: Fri, 10 Feb 2006 13:41:18 -0800
>> >> > >>>>>> Subject: Re: [saga-rg] job states...
>> >> > >>>>>> From: Christopher Smith <csmith at platform.com>
>> >> > >>>>>> To: Simple API for Grid Applications WG <saga-rg at ggf.org>
>> >> > >>>>>>
>> >> > >>>>>> On 4/2/06 11:18, "Andre Merzky" <andre at merzky.net> wrote:
>> >> > >>>>>>
>> >> > >>>>>> Ok ... I'll try to answer these, at least from my
viewpoint.
>> >> > >>>>>>
>> >> > >>>>>>>
>> >> > >>>>>>> I think that diagram is wrong, isn't it?  Well, here are
my
>> >> > >>>>>>> questions:
>> >> > >>>>>>>
>> >> > >>>>>>>   - if we submit a job, its immediately Queued - is that
>> >> > >>>>>>>     right?  Should it be pending before (e.g. as long as
>> the
>> >> > >>>>>>>     queuing request travels the middleware layers)?
>> >> > >>>>>>>
>> >> > >>>>>> To me, Queued is the same as Pending. Pending is probably
a
>> >better word
>> >> > >>>>>> for
>> >> > >>>>>> this. Can't remember where the Queued name came from, as
LSF
>> >uses PEND.
>> >> > >>>>>>
>> >> > >>>>>>>   - can the hold and suspend states reached only from
>> >> > >>>>>>>     'Running', or from elsewhere as well?
>> >> > >>>>>>>
>> >> > >>>>>> You can only go into a Hold state from Pending, I think,
or
>> >directly into
>> >> > >>>>>> Hold on submission.
>> >> > >>>>>>
>> >> > >>>>>>>   - What is the difference between 'Hold' and 'Suspend'?
>> >> > >>>>>>>
>> >> > >>>>>> A Hold state tells the scheduler/broker not to consider
this
>> job
>> >for
>> >> > >>>>>> scheduling/dispatch until the hold is explicitly
released.
>> >> > >>>>>>
>> >> > >>>>>>>   - Are there signals defined (apart from KILL) which
>> shange
>> >> > >>>>>>>     the job state?  I guess that is not as simple as
saying
>> >> > >>>>>>>     SUSP does suspend - that state is probably defined
by
>> >> > >>>>>>>     the scheduler, not by the OS...
>> >> > >>>>>>>
>> >> > >>>>>> Right ... this is implementation dependent on the
mechanism
>> used
>> >to
>> >> > >>>>>> suspend
>> >> > >>>>>> a job (might be a signal, might be some other mechanism).
>> What
>> >is
>> >> > >>>>>> important
>> >> > >>>>>> is that there is an operation to initiate the state
>> transition.
>> >> > >>>>>>
>> >> > >>>>>>>   - What is the use case for distinguishing between
>> UserHold
>> >> > >>>>>>>     and SystemHold, or between UserSuspend and
>> >> > >>>>>>>     SystemSuspend?
>> >> > >>>>>>>
>> >> > >>>>>> If I preempt workload, the system will put it into a
>> >SystemSuspend state
>> >> > >>>>>> that a user cannot cause a switch out of, otherwise a
system
>> may
>> >become
>> >> > >>>>>> oversubscribed due to the preempted and preempting jobs
>> running
>> >at the
>> >> > >>>>>> same
>> >> > >>>>>> time. A UserSuspend can be entered and exited by the
user,
>> and
>> >is often
>> >> > >>>>>> used
>> >> > >>>>>> to hold processing to check progress, etc.
>> >> > >>>>>>
>> >> > >>>>>>
>> >> > >>>>>> By the way ... I believe that the state diagram should at
>> least
>> >be a
>> >> > >>>>>> subset
>> >> > >>>>>> of the BES state diagram ... we should adopt the same
names.
>> >> > >>>>>>
>> >> > >>>>>> -- Chris
>> >> > >>>>>
>> >> > >>>>>
>> >> > >>>
>> >> > >>>
>> >> > >
>> >> > >
>> >>
>> >>
>> >>
>> >> --
>> >> "So much time, so little to do..."  -- Garfield
>> >>
>> >
>> >
>> >
>> >--
>> >Thilo Kielmann
>> >http://www.cs.vu.nl/~kielmann/
>
>
>
>--
>Thilo Kielmann
>http://www.cs.vu.nl/~kielmann/





More information about the saga-rg mailing list