[SAGA-RG] core spec errata

Andre Merzky andre at merzky.net
Mon Mar 2 16:37:05 CST 2009


Here is some food for discussion at Wednesdays SAGA session:

We have addressed most reported errors and mistakes etc in
the core spec by now, apart from 4.  These are listed below.
Please excuse the length of this mail.  I would appreciate
if people could read it anyway, so that we don't need to
fully describe all items in the SAGA sessions again.



1: - Page 33, figure 2: Picture has no block for URL, URL is nowhere
     to be found. iovec and parameter are in the wrong block (should 
     not be in io, but in file and rpc)                                                                                                                   
   This does not need any discussion, but simply needs to be done.

2: - job::service needs to have its rm URL as attribute, as
     that URL is needed if more job services are to be
     created in the same security domain, for example.
     Otherwise, a job_service created with no URL can never
     be reconnected to, e.g. to find previously run jobs.

    A code example would be:

    { 
      saga::job::service js_1;
      js.run_job ("/bin/sleep 1000");
    }
    {
      saga::job::service js_2;
      std::list <std::string> ids = js_2.list_jobs ();

      for ( int i = 0; i < ids.size (); i++ )
      {
        saga::job::job j = js_2.get_job (ids[i]);
      }
    }

    In this example, it is not guaranteed that the job from
    the first code block is amongst the ones found in the
    second block, as the default js constuctor may very well
    connect to a different backend.  Further, the user has
    no means to learn about the js URL, to reconnect later
    on. 

    A fix *could* look like:

    saga::url u;
    { 
      saga::job::service js_1;
      js.run_job ("/bin/sleep 1000");
      u = js_1.get_attribute ("url");
    }
    {
      saga::job::service js_2 (u);
      std::list <std::string> ids = js_2.list_jobs ();

      for ( int i = 0; i < ids.size (); i++ )
      {
        saga::job::job j = js_2.get_job (ids[i]);
      }
    }

    Please note that the js URL SHOULD be part of the job
    id, but that is not a hard requirement to the SAGA
    implementation.

    Also, the example above is certainly, well, dumb.  But
    we met a couple of use cases which make it sufficiently
    painful to keep track of jobs to ask for this to be
    fixed.


3: - context c'tor should not call set_defaults: that
     significantly complicates usage if default ctx cannot 
     be initialized.

   You have probably seen the lengthy email exchange between
   Ceriel and me on this list.  No other opinions have been
   forthcoming so far, but we need to get this item closed
   ASAP.

   So, a short summary:

   The original behaviour (calling set_defaults in the
   c'tor) failed for:

     saga::context c ("globus");
     c.set_attribute ("UserProxy", "...");
     c.set_defaults ();

   if no default globus context can be created, a globus
   context cannot be created at all (set_defaults, and thus
   the c'tor, would throw).

   So, we resolved that by not calling set_defaults in the
   c'tor, which got the above working as it should.

   Ceriel (and to some extent also Steve) are arguing that
   set_defaults is not needed in the API at all, but that
   errors on context initialization should be thrown on the
   first remote operation where that context is used.

   I counter-argued that this is an ill defined point, and
   that I'd rather prefer to have a fixed point in the code
   where I can expect the SAGA implementation to signal
   errors in the context initialization.  Yes, errors in
   context usage can still occur later on, but those are
   well defined.



4: - As suggested earlier, the algorithm to find the most
     specific error has been changed to ignore the
     NotImplemented error as long as there were other errors
     thrown.  NotImplemented will be reported only if there
     are 'only' NotImplemented errors in the error list.

   That errata item led to a lengthy discussion on several
   mail threads.  Below is a summary of those

       Issue: 
    ------

    The exception precedence list in the spec does not make sense:
    
    (a) the NotImplemented exception is actually the least informative
    one, and should be ate the *end* of the list.

    (b) for late binding implementation, and for implementations with
    multiple backends in general, it is very difficult to determine
    generically which exception is more interesting to the end user.


    Problem example:
    ----------------

    Assume an implementation of the SAGA file API binds (late) to HTTP
    and FTP.

    Assume the following setup:  on host a.b.c, an http server (http
    root = /var/www) and an ftp server (ftp root /var/ftp) are
    running, using the same credentials for access.

    The following file exist, and are owned by root (permissions in brackets)

      URL                   (rwx)

      /var/www/etc/         (x--)
      /var/www/etc/passwd   (xxx)
      /var/www/usw/         (xxx)

      /var/ftp/etc/         (xxx)
      /var/ftp/usw/         (x--)
      /var/ftp/usw/passwd   (xxx)

    Assume a SAGA application wants to open any://a.b.c/etc/passwd
    for reading.  The WWW backend will throw PermissionDenied, the FTP
    backend with throw DoesNotExist.

    Both exceptions are correct.  There are valid use cases for either
    exception to be the more specific, and thus, in the spec's
    argumentation, the more dominant one.  

    Further, upon accessing any://a.b.c/usw/passwd, the situation is excatly
    inversed.  Of course, the implementation will have no means to deduce the
    intention of the application, and to decide that suddenly the exception from
    the other backend is more useful.


    Diagnosis:  
    ----------

    The root of the problem is the ability of SAGA to be implemented with late
    binding.  Any binding to a single middleware will result in exactly one
    error condition, which is to be forwarded to the application.  Also,
    implementations with early bindings can (and indeed will) focus on
    exceptions which originate from the bound middeware binding for that
    specific object, and will again be able to report exactly one error
    condition.  (Note that for early binding implementations, the initial
    operation which causes the implementation to bind to one specific
    middleware is prone to the same exception ordering problem.)  
    
    So it is mostly for late binding implementations that this issue arises,
    when several beckends report errors concurrently, but the standard error
    reporting mechanism in most languages default to report exactly one error
    condition.


    Conclusion: 
    -----------

    A global, predefined ordering of exception will be impossible, or at least
    arbitrary.  The native error reporting facilities of most languages will by
    definition be inadequate to report the full error information of late
    binding SAGA implementations.

    That leaves SAGA language bindings with three possibilities:

    (a) introduce potentially non-native error reporting mechanisms

      saga::filesystem::file f ("any://a.b.c/etc/passwd");
      std::list <saga::exception> el = f.get_exceptions ();

      // handle all backend exceptions
      for ( int i = 0; i < el.size (); i++ )
      {
        try
        {
          throw el[i];
        }
        catch ( saga::exception::DoesNotExist )
        {
          // handle exception from ftp backend
        }
        catch ( saga::exception::PermissionDenied )
        {
          // handle exception from www backend
        }
      }


    (b) acknowledge the described limitation, document it, and stick to the
    native error reporting mechanism

        try
        {
           saga::filesystem::file f ("any://a.b.c/etc/passwd");
        }
        catch ( saga::exception::DoesNotExist )
        {
          // handle exception from ftp backend
        }
        catch ( saga::exception::PermissionDenied )
        {
          // handle exception from www backend (which will not be forwarded in
          // our example, this this will never be called)
        }


    (c) a mixture of (a) and (b), with (b) as default.

        try
        {
           saga::filesystem::file f ("any://a.b.c/etc/passwd");
        }
        catch ( saga::exception::DoesNotExist )
        {
          // handle top level exception
        }
        catch ( saga::exception e)
        {
           // handle all backend exceptions
           std::list <saga::exception> el = e.get_all_exceptions ();

           for ( int i = 0; i < el.size (); i++ )
           {
             try
             {
               throw el[i];
             }
             catch ( saga::exception::DoesNotExist )
             {
               // handle exception from ftp backend
             }
             catch ( saga::exception::PermissionDenied )
             {
               // handle exception from www backend
             }
           }
        }


    Note that (c) may not be possible in all languages.


    Discussion C++ bindings:
    ------------------------

    C++ is actually be able to implement (c).  The C++ bindings would then
    introduce a saga::exception class, and the respective sub classes, which
    represent the 'most informative/specific' exception.  How exactly the 'most
    informative/specific' exception is selected from multiple concurrent
    implementations is left to the implementation, and cannot sensibly be
    prescribed by the specification not the language binding, as discussed
    above.  (The spec could propose such a selection algorithm though).
    However, the saga::exception class could have the additional ability to
    expose the full set of backend exceptions, for example as list:

      std::list <saga::exception> saga::exception::get_all_exceptions ();

    Further, it would be advisable (for all language bindings actually) to
    include all error messages from all backend exceptions into the error
    message of the top level exception (this is already implemented in the
    CCT's C++ implementation):

     catch ( saga::exception e )
     {
       std::cerr << e.what ();
       // print the following message:
       //  exception (top level): DoesNotExist
       //     exception (ftp binding): DoesNotExist - /etc/passwd does not exist
       //     exception (www binding): PermissionDenied - access to /etc denied
     }


Best, Andre.


-- 
Nothing is ever easy.


More information about the saga-rg mailing list