[graap-wg] asynchronous binding

Mon Feb 14 13:39:01 CST 2005

On Feb 14, Takuya Araki (Biglobe) loaded a tape reading:
> Alain, Karl:
> 
> Thank you for the excerpts!
> 
> So Karl, let me confirm your opinion:
> Are you thinking of using the method which is implemented in WS-GRAM?
> If so, I agree with that it increases the reliability of the system, 
> but it doesn't seem to be able to replace asynchronous operations completely.
> 

Yes, that is my suggestion and no I do not completely agree with your
assessment... I am attaching a longer description of my position which
was drafted as part of a different activity.  I think it addresses
this topic well enough so I will only paste it and then add a few more
comments specifically about WS-Agreement.  I apologize for the length.

RELIABILITY: One issue that arises in job management is the
potentially high stakes of errors in the management
protocol. Specifically, while a client may well be expected to
tolerate rejection and even execution failure, it is undesireable to
have job management state "disappear" or get duplicated due to
message-layer, client, or provider failures.

We assert that the simple creation pattern is sufficient because there
are multiple binding options available to get reliability. One
approach is to have transactional bindings which use commit and
rollback to make sure a creation occurs exactly once or not at
all.

However, not all Internet deployments will have transactional
messaging, so another approach is to use WS-Addressing MessageID
header to get idempotent invocation.  This mechanism allows the client
to resend an application message in case it missed the response
message, and the provider must resend the response without duplicating
actions such as actual execution.  This "at most once" semantics is
sufficient for real world EMS scenarios, as the client will eventually
learn whether the execution was accepted or not.

GT 4.0 GRAM optionally uses a proprietary message-level concept
that is equivalent to the WS-Addressing MessageID in order to work
across any binding and because it was designed before the
WS-Addressing mechanism was fully clarified by its authors. In this
variant, the idempotent ID is sent as an extra field in the input
message and interpreted early in the request processing to avoid
duplication.

The idempotent operation style optimizes the success case by not
requiring additional message exchanges unless there is an error
condition or timeout.  In contrast, a transactional approach requires
more exchanges to setup and commit (or cancel) the invocation.

ASYNCHRONY: Another concern is how much delay may be encountered in
the creation pattern. The WS architecture makes no statements about
the relative duration of an "in-out" message exchange, as that is
essentially a binding issue.  Two camps seem to dislike long message
delays for different reasons which are both somewhat inconsistent with
the WS architecture model.

First, one camp confuses the WSDL message protocol with an API
specification, so they believe that a WSDL "in-out" message must mean
a blocking procedure call in their client bindings. They are
uncomfortable with the implication that a long delay cannot be
processed asynchronously by their application.  We believe this is a
mistaken viewpoint to take when designing protocols, because the
asynchrony of the client can be addressed simply by using an
appropriate tooling strategy.

They should move to better tooling if their existing client stubs are
indeed this limiting. For example, the C language WS tooling in GT 4.0
generates synchronous and asynchronous stubs for each WSDL operation,
so our GT 4.0 GRAM client tool is able to perform the creation message
exchange using asynchronous "post message" and "response callback"
programmatic interfaces. The newest JAX RPC revision also is said to
have better support for asynchronous invocation.

The second dissenting camp is concerned that long response delays will
be fragile because some bindings cannot tolerate the delay. For
example, a SOAP over HTTP binding may not be able to wait long enough
for a response before the TCP connection is lost.  Because an "in-out"
message pattern addresses the response implicitly via binding-level
context, it is not as durable as an explicit peer-to-peer message
exchange using "in only" messages sent to explicit endpoints at both
peer sites.

Unfortunately, this style is also difficult in constrained binding
environments because SOAP over HTTP is often valued specifically for
being asymmetric and allowing simple NAT/firewall traversal from
"anonymous" clients to well-known providers.  If we render a
peer-to-peer interface model in order to support fragile bindings, we
create obstacles for these other common deployment
environments.

[Please note, WS-Agreement is meant to support this peer-to-peer
pattern optionally, but I admit that we may need to make some
technical cleanup on the spec before completion... it seems to have
lost some details in the time I have been absent from the workgroup
discussions. See the optional initiator's EPR field in the create
call.  What is missing, I think, is clear normative text on how this
will be used by the responding party and how/if it should appear in
the Agreement context for correlative purposes.]

A third solution which happens to address both camps simultaneously is
to render explicit "post" and "poll" interfaces to initiate the
logical operation and then hold the response at the provider until the
client can reconnect and retrieve the result.  This supports fragile
bindings in NAT/firewall environments and also yields an asynchronous
interface with naive tooling that generates synchronous stubs for "in
out" message exchanges. However, it complicates the application-level
modelling and lifts transport-level message buffering into the
application-level service implementation.

We argue that a simple "in-out" message exchange in combination with
idempotent ID mechanisms can equally well satisfy the fragile bindings
and NAT/firewall asymmetry without significant impact on the
application protocol.  It still retains state at the provider, but
rather than adding post/poll operations to the WSDL it simply uses
message send for "post" and message resend for "poll". This also means
that the application logic can be written using the more natural
"in-out" pattern and a simple buffering layer at (or slightly above)
the binding code can handle the resends at the provider.

This model supports asynchrony because the polling exchange can
"block" at the messaging level until the binding times out. In other
words, a client who logically iterates with:

    ID = new_identifier

    while is_non_response ( result = EPR->create(ID, input content) )
      repeat

will not "spin" but rather post a new copy of the idempotent create
message at the frequency at which the binding signals an error, e.g. a
closed connection.  While the binding is still functioning, the
underlying protocol such as SOAP over HTTP will provide for
asynchronous delivery of the response message. (Note of course that the
above snippet could be written in a longer psuedo-code format by using
an asynchronous post/callback model such as we use in GT4 C
bindings. The sychronous create call is nothing more than a post
followed by a conditional wait on the callback monitor.)

This approach does not permit visibility as to WHY the response is
taking so long, but merely visibility as to WHETHER the response has
been issued yet. There is no lifecycle model in WS-Agreement for the
decision making that the Agreement provider performs while considering
an Agreement creation request, nor should we take lightly the burden
of trying to develop such a model.

> (By the way, it seems that GRAM has "batch mode" as an application level asynchronous operation.
> That's why the current method is enough for GRAM, I think.)
> 

No, actually our "globusrun" tool's batch mode is not about
asynchronous submission.  It simply turns off the subscription and
state monitoring that the tool normally does after submission.  The
submission step itself is roughly equivalent to the WS-Agreement
createAgreement operation.

karl

-- 
Karl Czajkowski
karlcz at univa.com