[graap-wg] A Highly available, Fault tolerant Co-scheduling System

Karl Czajkowski karlcz at univa.com
Sat Oct 8 01:36:22 CDT 2005


On Oct 07, Jon MacLaren modulated:
...
> The system uses the Paxos Commit protocol (Lamport, Gray) to overcome  
> the problems associated with distributed 2-phase commit.

Jon: 

As no doubt you'll remember, it has been proposed that advance
reservation is an approach to distributed co-allocation.
Specifically, advance reservation agreements can be seen as the
"prepare" step in the 2PC protocol and the subsequent claiming
agreements can be seen as the "commit" step.  As such, we can envision
WS-Agreement being used in the protocol between the 2PC transaction
manager and the resources (as well as between the initiator and the
transaction manager).

Anyway, I read up on Paxos a bit, and as far as I can tell it has
these same underlying mechanisms of prepare/commit at the individual
resources.  In essence, it is a way of making distributed transaction
managers as a group consensus on top of the same basic parties: one
who initiates the transaction and N who participate in it. It adds 2F
additional processes in between the initiator and resources to
tolerate F process failures.  Having actually studied and implemented
such a system, do you think this is an accurate summary?

Is there is anything you can identify that is missing from
WS-Agreement that would allow it to be used at each resource in the
Paxos Commit protocol in the same manner that we have intended it to
be used in the "prepare" and "commit" steps of the 2PC protocol?
E.g. two separate agreements at each resource to represent the two
phases?

My understanding is that there needs to be a way to name the agreement
such that each of the Paxos processes can find the same answer to the
"prepare" step at each resource.  Can Paxos elect a "leader" who
initiates the prepare step, e.g. CreateAgreement, so that the others
can just check the result status, e.g. RP query?  Or would a truly
idempotent CreateAgreement process be required so that any process can
initiate the prepare step and all will learn the same result using the
same message pattern, regardless of which contacts the resource first?
By the way, I think this latter behavior could be solved at the WS
binding level, using the current WS-Agreement definitions.  This
would be a different application of the same idempotent-submit
mechanism we use in WS-GRAM for simple reliability...

Of course, this use of agreements for the phases requires a certain
set of additional assumptions about how deterministic the claim step
is, once a reservation is held; otherwise, the semantics of the
"prepare" step (and the whole transaction) becomes wishy-washy.
Particularly, if the reservation agreements are constrained in time
(e.g. a typical wall-clock advance reservation scenario), the commit
protcol can be violated because the preparation can expire before the
commit phase is completed (violating the ACID properties). As I
understand it, Paxos can reduce the likelihood of delays due to
transaction manager failure, but arbitrary delay is still a hazard
with realistic messaging models, i.e. Internet-based services, because
of unbounded message delay/loss to the distributed resources that are
being coordinated.

Thoughts?

karl

-- 
Karl Czajkowski
karlcz at univa.com





More information about the graap-wg mailing list