[SAGA-RG] Service Discovery spec updated at last ...

Thu Dec 18 17:42:27 CST 2008

2008/12/18 Andre Merzky <andre at merzky.net>:
> Quoting [Steve Fisher] (Dec 18 2008):
>>
>> > In the attached version, I tried to address the things we
>> > agreed upon, and changed the service types/names table as
>> > proposed.  Could you give it another pass, please?
>> > I left the Constructors untouched for now, until we agreed
>> > on something, ok?
>> >
>> > Also, there is a question from me at the bottom of page 5:
>> >
>> >  "Note: Why not TimeOut, AuthorizationFailed, etc??"
>>
>> For the discoverer constructor, I guess the Authentication exception
>> can be thrown if you don't pass in a session or if you pass in what
>> used to be a valid session that has since expired. How would you class
>> that one?
>
> A session cannot expire - its lifetime is defined by the
> lifetime of the objects which use that session.  If a
> context (or credential) attached to that session expired,
> you can expect to see a lot of AuthenticationFailed
> exceptions, whenever that credential is being used.

Sorry that is what I meant. The session has not expired but it is
inoperable because credentials have expired

>> Unfortunately quite a lot of the allocation of exceptions is
>> implementation dependent. It would be better to be able to just throw
>> a Security exception - but that is not an option as it is not in the
>> core spec. If you need to get authenticated this could result in a
>> timeout if contact with some remote server is required.
>
> Well, than that is a TimeOut exception :-))  In general, it
> is of course difficult to map backend exceptions to SAGA
> exceptions in a meaningful way, but that cannot be avoided
> really, if the end user is to stay away from middleware
> exceptions.
>
>
>> In our current implementation, list-services does most of the work and
>> so can throw timeouts and could throw security errors except that we
>> happen to connect to a non-secured information service. However the
>> subsequent calls (such as get_data) can throw the same exceptions as
>> they could choose to go back to the information service to pick up
>> information that was not originally needed when making the selection.
>> If we don't allow the possibility to throw these exceptions the
>> implementation is needlessly restricted. This is why I like the
>> NoSuccess everywhere. In other words I would use NoSuccess for those
>> conditions which are implementation dependent - which in this case is
>> quite a lot.
>
> Good point, so NoSuccess may make sense in all calls.  I
> expected all implementations to simply fetch complete
> service descriptions, not only parts, but that may have been
> to limiting.
>
> But, is that really the way you expect implementations to
> work: to fetch some part of the service description, but,
> e.g., the name and URL just later on, on the get_attribute
> calls?  That adds an awful latency overhead, which easily
> kills any bandwidth savings, as long as you stay below x.000
> returned service descriptions - which is the dominating use
> case I dare to say.

Which approach is optimal depends upon the information system. However
there are advantages in making sure that list_services gets everything
loaded into memory because as you have pointed out it makes sure that
the subsequent get methods cannot fail.

> Requiring the implementation to fetch complete desciptions
> is a certain limitation, sure, but allows more tightly
> defined semantics for the application and end user...
>
> So, that is up to you (I do not know the implementations),
> but please consider the argument.

Yes I agree it would be best to get all the service info in one go.
Then it either succeeds or fails

>> For example you have added DoesNotExist for the
>> constructor - implying that the API would contact the underyling
>> information service. However there is no reason why it should do so at
>> this stage.
>
> We have the same problem with the RPC package - there the
> Constructor has the following note:
>
>  - according to the GridRPC specification, the constructor
>    may or may not contact the RPC server; absence of an
>    exception does not imply that following RPC calls will
>    succeed, or that a remote function handle is in fact
>    available.
>
> and the call() method has the following one:
>
>  - according to the GridRPC specification, the RPC server
>    might not be contacted before invoking call(). For this
>    reason, all notes to the object constructor apply to the
>    call() method as well.
>
> Also, call() has all exceptions of the Constructor, to allow
> for delayed throws.
>
> These notes are, as noted, a result of an underspecification
> of the GridRPC standard.  I would really like to avoid to
> sprinkle such notes all over the place, as it makes a tight
> semantic definition of the API rather impossible.
>
> I think, if you have no really hard constraints from an
> underlying standard (as we had with RPC), or from a
> absolutely dominating implementation, it is not too much to
> ask from an implementation to ensure in the Constructor if
> the given (or chosen) URL actually points to a service which
> is alive and usable - what do you think?

This time I don't agree. For someone looking for one service it would
mean two calls to the info system rather than one, also even if you
find the service is available in the constructor it may have died by
the time you do list_services. So you need the exceptions in
list_services to cover lost credentials, timeouts/dead services,
authz, I would be happy to assume that the constructor only contacts
another service as part of the session setup and does not contact the
info system.

Actually I don't think that much use will be made of the url
parameter. Rather I expect the sysadmin to make sure that the desired
set of adapters is installed and each will probably have its own
configuration file

> Otherwise, indeed, you need to repeat all exception on all
> calls (which makes catching and error recovery tedious for
> the application), or map them all to NoSuccess later on
> (which also gives the application no useful means to
> recover, e.g. an application does not get the info that a
> different URL or context may have helped).