[ogsa-wg] Paper proposing "evolutionary vertical design efforts"

Tue Mar 21 14:50:09 CST 2006

On Mar 21, Marvin Theimer modulated:
> Hi;
> 
>  
> 
> Whereas I agree with you that at-most-once semantics are very
> desirable, I would like to point out that not all existing job
> schedulers implement them.  I know that both LSF and CCS (the Microsoft
> HPC job scheduler) don’t.  I’ve been trying to find out whether PBS and
> SGE do or don’t. 
> 

Aside from the comment Ian made that it is potentially useful to have
at-most-once message semantics even if there is some potential for a
local failure in the message processing (handoff from message layer to
local scheduler), I believe LSF does support "hold" states where a job
can be submitted and released as a two-phase interaction.

Such a mechanism is sufficient to implement a complete end-to-end
at-most-once submission by implementing logging in the message engine
to associate the client message with a local job handle before
submitting.  Most schedulers also support job naming/annotation fields
which are exposed through the job query interface.  This can also be
used to implement a reliable correlation between message/request IDs
and the local implementation job.  This can also be used to synthesize
an at most once semantics in front of the scheduler, by determining if
a local job exists before trying to resubmit with the same name.  This
behavior can be hidden in the message engine and "local adapter".

> So, this brings up the following slightly more general question: should
> the simplest base case be the simplest case that does something useful,
> or should it be more complicated than that?  I can see good arguments
> on both sides:
> 

I find it a little disconcerting that this question is still being
asked about job systems, because there is a history of having made and
retracted this decision before.  We did it in Globus with GRAM, and I
think several of the other Grid projects did as well...

The subset interface is not sufficient for users.  A solution MUST
incorporate an interoperable subset plus a robust extensibility
mechanism to allow any of:

   1. incremental evolution of the core subset
   2. vendor-specific localization/extension
   3. community/site-specific localization/extension
   4. discovery of extended mode support
   5. graceful degradation in the absence of extended mode support

In my opinion, anything short of this will just add another
non-interoperable interface to the hodge-podge of non-interoperable
solutions that already exist.

karl

-- 
Karl Czajkowski
karlcz at univa.com