[SAGA-RG] Fwd (mathijs at cs.vu.nl): Suboptimal things in SAGA

Sat Oct 17 23:25:11 CDT 2009

Hi Matthijs, 

we did not manage to discuss all at OGF - agenda was pretty packed
already.  Thus again by mail...

Quoting [Thilo Kielmann] (Oct 15 2009):
> 
> more food...
> 
> ----- Forwarded message from Mathijs den Burger <mathijs at cs.vu.nl> -----
> 
> > Subject: Suboptimal things in SAGA
> > From: Mathijs den Burger <mathijs at cs.vu.nl>
> > To: Thilo Kielmann <kielmann at cs.vu.nl>
> > 
> > Hi,
> > 
> > Here's a list of things I feel that are not optimal in SAGA right now.
> > Maybe interesting for (lunch)discussions at OGF?
> > 
> > 1. Exception handling in engines with late binding is a pain. When
> > multiple adaptors are tried automatically and all fail, it is hard to
> > figure out what actually went wrong. JavaGAT also has this problem.
> > Users run away screaming when seeing their first nested exception: it
> > contains 10 backends they never heard of nor asked for complaining about
> > stuff they do not understand.

Agree, it is painful.  But what can you do?  At best, the engine is
able to emply some heuristics to extract the most relevant exception
and push that to the top level.  Your application should then only
print that message to stderr, by default.

The only real 'solution' would be to disable late binding...  Or do
you see any other way?

> > 2. The any:// scheme is evil. It works well if the available backends
> > change a lot, but that is not the case in practice. Users know very well
> > what backends are used for what Grid sites: it's rather static info. 

Well, then they should use the backend specific URLs, not 'any'!
'Any' is exactly for those cases where the backend is *not* known -
in all other cases, it does not make sense to use it.

If you think this is too much trouble for your users, simply disable
'any'.  The spec says:

  "The SAGA API specifiation allows the use of the placeholder â€™anyâ€™
  (as in any://host.net/tmp/file). A SAGA compliant implementation
  MAY be able to choose a suitable protocol automatically, but CAN
  decline the URL with an IncorrectURL exception."

> > A much cleaner design (e.g. followed in IbisDeploy) to alleviate
> > problems 1 and 2 is to define a number of Grid sites that each have
> > certain backends and for which you have certain credentials. A SAGA
> > engine can then, per site, only try the adaptors that make sense in the
> > first place.

Hmm, isn't that what is happening?  The default session should
contain saga contexts for those backends you have security
credentials for.  As the adaptors live in that session, only those
adaptors should get active for which a context exists.  

You probably mean that the other adaptors still throw an
AuthorizationFailed exception if no context is available?  Well, one
can disable the adaptors.  

So, I guess what I try to say is that the saga::session can be used
to specify the backends to use, via the contexts.

> > This limits the backends tried to the ones explicitly
> > specified by the user, which makes it much more comprehensible what is
> > going on. It is also faster, since not all adaptors have to be tried.
> > Currently, there is no generic way in SAGA to limit which adaptors or
> > credentials are used for a site. JavaGAT does have such functionality. 

The generic way is to create a session with those contexts (aka
credentials) attached which you want to use.  Say, you want to limit
the set of active adaptors to the globus adaptors, do

  saga::session s;
  saga::context c ("globus");
  s.add_context (c);

  saga::filesystem::file f (s, url);

This should get you only the globus adaptor - all others will bail
out, right? (sorry if my answer is a repetition from above)

> > 3. Sessions with multiple contexts of the same type should be forbidden.
> > Trying them all may have weird and unwanted side-effects (e.g. creating
> > files as a different user, or a security lockout because you tried to
> > many passwords). It confuses the user. This issue is related to point 2.

This is a tough one.  The problem here is that a context type is not
bound to a backend type.  Like, both glite and globus use X509
certs.  Both AWS and ssh use openssl keypairs.  Both local and ftp
use Username/Password, etc.  I don't think this is something one can
enforce.

We had the proposal to have the context types not bound to the
backend *technology* (x509), but to the backend *name* (teragrid).
This was declined as it makes it difficult to run your stuff on a
different deployment using the same cert.

> > 4. URL schemes are ill-defined. Right now, knowing which schemes to use
> > is implementation-dependent voodoo (e.g. what is the scheme for running
> > local jobs? Java SAGA uses 'local://', C++ SAGA used 'fork://'). There
> > is no generic way of knowing these schemes other than 'read the
> > documentation', which people don't do. Essentially, these schemes create
> > an untyped dependency of a SAGA app to a SAGA implementation, causing
> > SAGA apps not to be portable across implementations unless they all have
> > the same adaptors that recognize the same schemes.

Correct.  Schema definition is not part of the spec.  I argue it
should not be either, as that can only be a restrictive
specification, which would break use cases, too.  Only solution
right now is to create a registry - simply a web page which lists
recommendations on what scheme to use for what backend.  Would that
make sens to you?

> > 5. Bulk operations are hard to implement and clumsy to use. Better would
> > be to include bulk operations directly in the API where they make sense.
> > It's much simpler to implement adaptors for that, and much easier for
> > users to use and comprehend.

Oops - bulk ops were designed to be easy to use!  Hmmm...

About the hard to implement: true, but iff they are easy to use,
then that does not matter (to the SAGA API spec).

Why were bulk ops not explicitely added to the spec is obvious: it
would (roughly) double the number of calls, and would lead to some
pretty complex call signatures:

  list <list <url> > listings = dir.bulk_list (list <url>);
  list <int>         results  = file.bulk_read (list <buffer>, list <sizes>);

Further, this would lead to even more complex error semantics (what
happens if one op out of a bulk of ops fails?).

This all is avoided by the current syntax

  foreach url in ( list<url> )
  {
    tc.add_task (dir.list <Async> (url));
  }
  tc.wait (All);

Not that difficult to use I believe?

> > Sorry for the rant :)

Hey, thats ok! :-)

Also, I am very biased in my answers as you'll notice, and probably
somewhat defensive, too.  So, it would be nice to hear from others,
too!

Cheers, and thanks, Andre.

-- 
Nothing is ever easy.