[ogsa-wg] Paper proposing "evolutionary vertical design efforts"
Karl Czajkowski
karlcz at univa.com
Tue Mar 21 18:01:24 CST 2006
On Mar 21, Christopher Smith modulated:
> No Ian ... I’m not saying that the ability to tell whether your job has
> been submitted is not important. What I am saying is that for systems
> like LSF, implementing this in the submission protocol is not necessary
> as there are other ways of figuring this out (such as Karl outlined in
> another email). Thus, having this protocol implemented is not important
> for our customers, who might rather see other features added to the
> product.
>
> -- Chris
>
Chris, that is not an accurate characterization of what I wrote.
I said that it is easy to implement an idempotent message protocol in
front of two different LSF local submission mechanisms (hold+release
and job name annotations), and therefore was implying that it is not a
significant implementation burden to support idempotence in a standard
protocol!
I suspect that most schedulers have some "client provided job name"
option that can be used in a general adapter solution:
1. standard protocol client chooses unique idempotence ID
2. standard protocol client sends message, possibly more than once
3. standard protocol service receives message, possibly more than once
a. implement a persistent atomic <client-ID, engine-ID, job-ID> map
b. # log client-ID for protocol idempotence
if <client-ID, *, *> is not in map
then
engine-ID := new unique ID()
enter <client-ID, engine-ID, nil> into map
else
find engine-ID for client-ID in map
endif
c. # log local ID and job ID for local idempotence
if <client-ID, engine-ID, nil> is in map
then
if job system has a job annotated with engine-ID
then
job-ID := job system ID for job annotated with engine-ID
else
job-ID := submit(job annotated with engine-ID)
endif
enter <client-ID, engine-ID, job-ID> into map
else
find job-ID for engine-ID in map
endif
this process is crash-recoverable to provide at-most-once semantics
for local job submission, with as much reliability as there is in
the persistent log mechanism. it also requires that jobs "linger"
in the local scheduler (or accessible log files) long enough for
recovery to take place.
This works in practice if the service engine can formulate unique IDs
that are unlikely to have a collision with any other client-specified
name annotations for jobs. I use a separate client-ID and engine-ID
to clarify that this solution does not depend on the client following
a locale-specific unique job naming convention. Of course, the client
must follow the standard protocol's conventions for unique message
naming.
karl
--
Karl Czajkowski
karlcz at univa.com
More information about the ogsa-wg
mailing list