[ogsa-wg] Paper proposing "evolutionary vertical design efforts"

Karl Czajkowski karlcz at univa.com
Tue Mar 21 22:16:19 CST 2006


On Mar 21, Marvin Theimer modulated:

> I know that systems like LSF get used in high throughput settings where
> the service time for a job request is an issue....
> ...  If
> my assumption is correct, then this common use case in the HPC world
> may be one that many, if not most job schedulers would have a hard time
> supporting if they have to provide at-most-once transactional semantics
> for all job submissions. 
> 

I do not think anyone claimed that at-most-once semantics should be
mandated on all requests.  Certainly nobody from Globus says
this... it is an optional feature of our job submission protocol, to
be chosen by the client depending on their needs.

I think the question is much more about whether (or how many times) an
optional at-most-once extension mechanism is defined.  Secondarily,
there is the question of efficiently determining if it (as an
extension) is available in a remote service.  A third interesting
question might be determining what the "cost" of the extension is
versus the cost of having lost jobs against an unknown remote service
implementation when setting up to do an extremely high throughput run
as you describe.

The high throughput case is interesting to me, because it is precisely
the user community that demanded an efficient at-most-once semantics
from GRAM!  They are the ones who blast enough jobs through to notice
statistical failure rates and the cost of recovery.


karl

-- 
Karl Czajkowski
karlcz at univa.com





More information about the ogsa-wg mailing list