[ogsa-wg] OGSAInfo03July08

Fri Jul 4 05:30:25 CDT 2008

Andre Merzky wrote:
> A small comment to the pull/push question raised in the
> meeting notes: why not support both models?  IMHO, there are
> always use cases which are very unhappy if you support
> either one, and not the other...  You can always allow for
> _implementations_ to support only one model.

Information bound for clients has to be pulled; you just can't rely on
them being able to receive unsolicited SOAP messages due to firewalls
and the like[*]. On the other hand, when building an information service
you want the basic info pushed in from the collection points (e.g. the
BES container publishes an advert locally that it exists). The tricky
bit is working out where to switch from push to pull; my instinct is to
put that at the boundary between service provider and service consumer
(this is in the simple no-middlemen case).

The project I'm working on is looking at using RDF/SPARQL for the
information system. On the one hand it does mean that we'd have good
expressibility, but on the other hand I worry about practicality and
performance. The gripping hand is that this isn't done yet anyway, so
it's pure speculation as to whether it is a good idea. :-)

Thinking back to what Lawrence was saying, the thing that worried me
about it was that he was describing a system that pulled a lot of data
to the local system (well, it was actually to the job execution system
but that's the local system from the perspective of the query) before
taking a decision. That seems horribly inefficient. It's better IMHO to
push a slightly more complex query to the info system and have that
return a smaller set of results of higher quality (ideally, you'd get it
down to a single message each way for the majority of cases). As a side
advantage then is that it is possible to evaluate the query while taking
into account information that you don't want to expose to the client;
for example, you don't need to let them see the provisioning schedule of
your (i.e. the provider's) disk arrays, as you can just tell them that
the space they asked for will be there when the job completes. Similarly
for jobs; you don't need to reveal whether an application is installed
or whether you use virtualized images, just that the job can be executed
when needed.

[Hmmm, this message is already longer than I set out to write...]

Donal.
[* Someone ought to do a profile of SOAP using XMPP as a transport,
    since with that you *could* do real message push. ]