[ogsa-wg] Job State proposal made to SAGA-RG

Christopher Smith csmith at platform.com
Tue Apr 25 08:48:57 CDT 2006


Exactly. The thinking is that the base set of states will be fairly simple,
and that capabilities such as suspend/resume will be described in extensions
because it might not make sense for all implementations.

-- Chris


On 25/4/06 03:14, "Steve Loughran" <steve_loughran at hpl.hp.com> wrote:

> Christopher Smith wrote:
>> Hi all,
>> 
>> Per Marvin's comments ...
>> 
>> Here is a pointer to the proposal for modelling job states that I made to
>> the SAGA group last February.
>> 
>> http://www-unix.gridforum.org/mail_archive/saga-rg/2006/02/msg00107.html
>> 
>> -- Chris
>> 
> 
> Presumably suspend<-->resume is optional? That is, there are some things
> that cannot be suspended, or at least they suspend but cannot resume? (*)
> 
> That is something that is not in the CDDLM model (which is based on the
> WSDM state model), because its a lot harder t to suspend things like a
> database and a server hosting many active connections. Its a lot easier
> to shut it down and redeploy later, relying on the application to be
> able to continue when it is redeployed. Which is a good idea for
> anything you want to be resilient. [1]
> 
> If you have VM images you can suspend them, but at least as far as
> vmware is concerned, the apps don't get warned before and after, so they
> have a worse experience than on a laptop, where apps and drivers get
> warned that they are about to suspend and told that they have woken up.
> All you know about on a vmware hosted image is that the clock suddenly
> jumps and all your active TCP start throwing errors. You may have
> suspended, but the world still turns.
> 
> 
> (*) Even on ACPI laptops there is evidence of a de-facto suspend state,
> S6: the sleep from which laptops do not recover. [2]
> 
> [1] http://swig.stanford.edu/~candea/papers/crashonly/
> [2] http://www.hpl.hp.com/techreports/2000/HPL-2000-21.html .





More information about the ogsa-wg mailing list