[gridrpc-wg] Final call: GridRPC API for inclusion in SAGA 1.0

Andre Merzky andre at merzky.net
Thu Aug 10 03:15:54 CDT 2006


Hi Thilo, 

Some comments inlined.

Cheers, Andre.


Quoting [Thilo Kielmann] (Aug 09 2006):
> Date: Wed, 9 Aug 2006 19:00:49 +0200
> From: Thilo Kielmann <kielmann at cs.vu.nl>
> To: gridrpc-wg at gridforum.org, saga-rg at gridforum.org
> Subject: [gridrpc-wg] Final call: GridRPC API for inclusion in SAGA 1.0
> 
> Dear all,
> 
> Hidemoto Nakada, Yusuke Tanimura, and myself have met and we have 
> re-worked the pending proposal for a GridRPC module for inclusion in
> the upcoming SAGA spec. The only thing we have changed is the way in which
> parameters to an RPC invocation are passed. Now, it is an array of parameters,
> where a paramter is a struct, consisting of buffer, size, and, IN/OUT/INOUT
> mode. We have also modelled examples from the Ninf-G web page with this API.
> We think that this now has a "look and feel" of GridRPC.

It should have the look & feel of SAGA, and the semantics of
GridRPC :-) - but if it closes in on look & feel on both
ends, so the better of course...


> Attached please find the proposed text with API and examples. 

I felt free to change some part to adapt it to the spec look
and feel (example coding conventions, intendation etc.).
I'll convert that version to tex now, and add it to the
CVS.  Hope thats ok with you.


> If this (or
> something similar) can be agreed upon quickly, then the GridRPC module can
> be included in the SAGA 1.0 spec. We see this API not as a competitor to the
> GridRPC API, as published in GFD-R.52. It is rather an alternative, embedded
> in the SAGA framework, to access existing GridRPC implementations, thus
> extending their user base.
> 
> We are now doing the final call for comments for the two groups involved.
> If you have comments or questions, please post them TO BOTH MAILING LISTS.
> 
> This final call is open until FRIDAY, AUGUST 18th.

There is one conflict: I hope that we can get the SAGA CORE
spec into final mailing list call on Monday.  So I suggest
to NOT move that timeline backwards, but to include rpc
already - if either of the final calls meets negative
comments we remove it - does that make sense?

More comments below


> Comments and objections made by then will happily be included in the
> proposal. By this deadline, we will add this text (with all modifications
> we got) into the SAGA API document.
> 
> Thanks to you all for your contributions.
> 
> Thilo Kielmann

> +-------------------------------------------------------------+
> 
>                   ######  ######   #####
>                   #     # #     # #     #
>                   #     # #     # #
>                   ######  ######  #
>                   #   #   #       #
>                   #    #  #       #     #
>                   #     # #        #####
>     
> +-------------------------------------------------------------+
>      
>      
> Summary:
> ========
>      
>   GridRPC is one of the few high level APIs defined by the GGF.
>   Including it into the SAGA API benefits both:  SAGA gets more
>   complete, and provides a better coverage of its use cases with
>   a single look and feel; and GridRPC gets embedded into a set
>   of other tools of similar scope, which opens it to
>   a potentially wider user community, and ensures its further
>   development.
> 
>   The RPC package of the SAGA API described here is a one to one
>   mapping from the GridRPC standard, equipped with the SAGA look
>   and feel, error conventions, task model etc.
> 
> +-------------------------------------------------------------+
> 
> Specification:
> ==============
> 
>   package saga version 0.1 
>   {
>     package rpc 
>     {
>       enum io_mode 
>       {
>         In    = 1,  // input  parameter
>         Out   = 2,  // output parameter
>         InOut = 3   // both input and output parameter
>       }
> 
>       struct parameter
>       {
>         long        size;     // number of bytes in buffer
>         array<byte> buffer;   // data
>         io_mode     mode;     // parameter mode
>       }
> 
>       class rpc 
>       {
>         CONSTRUCTOR (in    string            funcname  );
>         call        (inout array<parameter>  parameters);
>       }
>     }
>   }
> 
> +-------------------------------------------------------------+
> 
> #ifndef SHORT
> 
> Details:
> ========
> 
>   class rpc:
>   ----------
> 
>     This class represents a remote function handle, which 
>     can be called (repeatedly), and returns the result of 
>     the respective remote procedure invocation.  
>     
>     The class offers one non trivial constructor, which
>     initialises the remote function handle (see [1] for
>     details).  That process may involve connection setup,
>     service discovery etc.  The class further offers one method
>     'call()', which invokes the remote procedure, and returns 
>     the respective return data and values.  
> 
>     In the constructor, the remote procedure to be invoked 
>     is specified by a URL, whith the syntax:
> 
>       gridrpc://server.net:1234/my_function
> 
>     with the elements responding to:
> 
>       gridrpc     - scheme - identifying a grid rpc operation
>       server.net  - server - server host serving the rpc call
>       1234        - port   - contact point for the server
>       my_function - name   - name of the remote method to invoke
> 
>     All elements but the scheme can be empty, which allows the
>     implementation to fall back to some default remote method to
>     invoke (minimal URL: gridrpc:///).

The description of the constructor says that the URL can be
NULL, in some languages that will mean 'empty' (e.g. if they
have no default args).  That would mean that scheme can be
empty, too.  


>     The argument and return value handling is currently very
>     basic, and reflects the traditional scheme for remote
>     procedure calls: an array of parameters, for each of which
>     the buffer, its size, and the input/output mode is
>     described.  On invocation of the 'call' method, for each
>     parameter the 'mode' value has to be initialized, for
>     parameters with mode 'In' or 'InOut', also 'size' and 'buffer'
>     must be initialized.  For parameters with mode OUT, 'size'
>     might also have the value 0 in which case the 'buffer' is
>     considered to be void, and has to be created (e.g.,
>     allocated) by the SAGA system upon arrival of the result
>     data.

Not that I disagree, but it should be noted that even for
RPC calls which require input parameters only, the params
must be passed by reference.  That implies, in some
languages, to track the param memory for async invokations,
where that is not the case for async invokations of other
languages which don't have no output parameters.  It can;t
be helped I guess, and should not be an issue really.


>     This argument handling scheme allows efficient (zero-copy)
>     passing of parameters. For 'Out' parameters with a size value
>     of 0, the implementation is required to allocate the data
>     structure and to overwrite the size and buffer fields for
>     the parameter.

It is the responsibility of the application programmer to
free this memory I assume?  The the language bindings MUST
prescribe how that memory is allocated, to allow the
application to choose the appropriate de-allocation method.
Alternatively we would need an 'dealloc' method, which would
then require the implementation to alloc and de-alloc the
params (and to keep track of the blocks).


>     - constructor
>       Purpose: inits a remote function handle
>       Format:  CONSTRUCTOR  (in  session session, 
>                              in  string  funcname)
>       Inputs:  session:      saga session to use
>                funcname:     name of remote method to
>                              initialize
>       Outputs: -
>       Throws:  NoSuccess:          server could not be
>                                    contacted, or method is 
>                                    unknown

I assume a number of other exceptions would apply as well,
such as 

  AuthenticationFailed
  AuthorizationFailed
  PermissionDenied
  DoesNotExist (server contacted, but no such call available)
  IncorrectURL

Well, and some more I guess.  Question is: NoSuccess is
actually reserved as last resort, if no other exception
really applies (its the least specific exception, please
have a look at the 'error handling' section in the spec).
So, is NoSuccess really needed here, and in what conditions?


>       Notes:   - see [1] for details
>                - if funcname is NULL, a default handle is 
>                  created
> 
>     - call
>       Purpose: call the remote procedure
>       Format:  call         (inout array<parameter> param);
>       Inputs:  - 
>       In/Out:  param:        argument/result values for call
>       Outputs: - 
>       Throws:  NoSuccess:    remote operation failed

Same as above.

>       Notes:   - see [1] for details
>                - by passing an array of variable size, 
>                  different numbers of parameters can be 
>                  handled. No special variable argument
>                  list handling is required.

We discussed varargs at one of the last GGF, and came to the
conclusion that language bindings COULD allow varargs.  That
does not make sense with the proposed scheme, in particular
in respect to the memory allocation policy described.  So, I
guess we abstain from varargs in the language bindings then?


Other open questions we had from former RPC discussions,
and which should be addressed in this spec, are:

  - GridRPC takes a config file name on initialization.
    That config file needs to be user specific IIUC, and
    there was some discussion, but no conclusion about that.
    So, what is the appraoch on that?  Is that spec
    implementable on top of GridRPC, and how?  If that is an
    issue still: our decision was to include the config file
    URL as (optional) parameter to the CONSTRUCTOR.  Does
    that make sense?

  - The RPC spec is silent about 'when' the connection to
    the remote server is made, on creation of the handle, or
    on call().  We decided in other parts of the spec that,
    for example, the constructor opens a file, or remote
    connection.  I propose to prescribe the same for RPC.
    Does that make sense?  Do we need to loosen the
    semantics elsewhere in the spec? (IMHO not).

  - Ninf-G allows to bind a handle to multiple calls.  I
    assume that this is hidden in the implementation for
    now, and has no explicit reflection in the API?  I think
    that is what we decided on anyway...

  - should we add a 'cancel(in float timeout)'?  Explicit
    resource dealloction was an issue in our discussion at
    GGF, and we agreed on cancel - is that not needed
    anymore?

Cheers, Andre.



> +-------------------------------------------------------------+
> 
> 
> Examples:
> =========
>   // c++ example
>   // call a remote matrix multiplication A = A * B
>   try 
>   {
>     rpc rpc ("gridrpc://fs0.das2.cs.vu.nl/matmul1");
> 
>     std::vector <saga::rpc::parameter> params (2);
> 
>     params[0].buffer = // ptr to matrix A
>     params[0].size   = sizeof (buffer);
>     params[0].mode   = saga::rpc::InOut;
> 
>     params[1].buffer = // ptr to matrix B
>     params[1].size   = sizeof (buffer);
>     params[1].mode   = saga::rpc::In;
> 
>     rpc.call (params);
> 
>     // A now contains the result
>   }
>   catch ( const saga::exception & e)
>   {
>     std::err << "SAGA error: " << e.what () << std::endl;
>   }
> 
> 
> 
>   // c++ example
>   // call a remote matrix multiplication C = A * B
>   try 
>   {
>     rpc rpc ("gridrpc://fs0.das2.cs.vu.nl/matmul2");
> 
>     std::vector <saga::rpc::parameter> params (3);
> 
>     params[0].buffer = NULL;  // buffer will be created
>     params[0].size   = 0;     // buffer will be created
>     params[0].mode   = saga::rpc::Out;
> 
>     params[1].buffer = // ptr to matrix A
>     params[1].size   = sizeof (buffer);
>     params[1].mode   = saga::rpc::InOut;
> 
>     params[2].buffer = // ptr to matrix B
>     params[2].size   = sizeof (buffer);
>     params[2].mode   = saga::rpc::In;
> 
>     rpc.call (params);
> 
>     // params[0].buffer now contains the result
>   }
>   catch ( const saga::exception & e)
>   {
>     std::err << "SAGA error: " << e.what () << std::endl;
>   }
> 
> 
> 
>   // c++ example
>   // asynchronous version of A = A * B
>   try 
>   {
>     rpc rpc ("gridrpc://fs0.das2.cs.vu.nl/matmul1");
> 
>     std::vector <saga::rpc::parameter> params (2);
> 
>     params[0].buffer = // ptr to matrix A
>     params[0].size   = sizeof (buffer);
>     params[0].mode   = saga::rpc::InOut;
> 
>     params[1].buffer = // ptr to matrix B
>     params[1].size   = sizeof (buffer);
>     params[1].mode   = saga::rpc::In;
> 
>     saga::task t = rpc.call <saga::task::ASync> (params);
> 
>     t.wait ();
>     // A now contains the result
>   }
>   catch ( const saga::exception & e)
>   {
>     std::err << "SAGA error: " << e.what() << std::endl;
>   }
> 
> 
>   // c++ example
>   // parameter sweep example from
>   // http://ninf.apgrid.org/documents/ng4-manual/examples.html
>   //
>   // Monte Carlo computation of PI
>   //
>   try 
>   {
>     std::string   uri[NUM_HOSTS]; // initialize...
>     long times, count[NUM_HOSTS], sum;
> 
>     std::vector <saga::rpc::rpc> servers;
> 
>     // create the rpc handles for all URIs
>     for ( int i = 0; i < NUM_HOSTS; ++i )
>     {
>       servers.push_back (saga::rpc::rpc (uri[i]));
>     }
> 
>     // create persisten storage for tasks and parameter structs
>     saga::task_container tc;
>     std::vector <std::vector <saga:rpc::parameter> > params;
> 
>     // fill parameter structs and start async rpc calls
>     for ( int i = 0; i < NUM_HOSTS; ++i )
>     {
>       std::vector <saga::rpc::parameter> param (3);
> 
>       param[0].buffer = i; // use as random seed
>       param[0].size   = sizeof (buffer);
>       param[0].mode   = saga::rpc::In;
> 
>       param[1].buffer = times;
>       param[1].size   = sizeof (buffer);
>       param[1].mode   = saga::rpc::In;
> 
>       param[2].buffer = count[i];
>       param[2].size   = sizeof (buffer);
>       param[2].mode   = saga::rpc::Out;
> 
>       // start the async calls
>       saga::task t = servers[i].call <saga::task::ASync> (param);
> 
>       // save the task;
>       tc.add (t[i]);
> 
>       // save the parameter structs
>       params.push_back (param);
>     }
> 
>     // wait for all async calls to finish
>     tc.wait (-1, saga::task::All);
> 
>     // compute and print pi
>     for ( int i = 0; i < NUM_HOSTS; ++i )
>     {
>       sum += count[i];
>     }
> 
>     std::out << "PI = " 
>              << 4.0 * ( sum / ((double) times * NUM_HOSTS))
>              << std::endl;
>   }
>   catch ( const saga::exception & e)
>   {
>     std::err << "SAGA error: " << e.what () << std::endl;
>   }
> 
> +-------------------------------------------------------------+
> 
> 
> Notes:
> ======
> 
>   References:
>   -----------
> 
>     [1] H. Nakada, S. Matsuoka, K. Seymour, J.Dongarra, 
>         C. Lee, H.  Casanova: "A GridRPC Model and API for End-User
>         Applications", Global Grid Forum Document GFD-R.52.
> 
>   Comparision to the original GridRPC calls:
>   ------------------------------------------
> 
>     initialization:
>     ---------------
> 
>     - grpc_initialize 
>       
>       GridRPC: reads the configuration file and initializes the
>                required modules.
>       SAGA:    not needed, implicit
>      
>      
>     - grpc_finalize 
> 
>       GridRPC: releases any resources being used
>       SAGA:    not needed, implicit
>       
>      
>     handle management:
>     ------------------
>      
>     - grpc_function_handle_default 
>     
>       GridRPC: creates a new function handle using the default
>                server.  This could be a pre-determined server 
>                name or it could be a server that is dynamically 
>                chosen by the resource discovery mechanisms of 
>                the underlying GridRPC implementation, such as 
>                the NetSolve agent.
>       SAGA:    default constructor
> 
>      
>     - grpc_function_handle_init 
>       
>       GridRPC: creates a new function handle with a server
>                explicitly specified by the user.
>       SAGA:    explicit constructor
>       
>     
>     - grpc_function_handle_destruct 
>       
>       GridRPC: releases the memory associated with the
>                specified function handle.
>       SAGA:    destructor
>     
>     
>     - grpc_get_handle
> 
>       GridRPC: returns the handle corresponding to the given
>                session ID (that is, corresponding to that 
>                particular non-blocking request).
>       SAGA:    not possible right now.
>                However, status of asynchronous operations can be checked
>                via the corresponding task objects.
> 
> 
>     call functions:
>     ---------------
> 
>     - grpc_call 
>     
>       GridRPC: makes a blocking remote procedure call with a variable
>                number of arguments.
>       SAGA:    has no variable number of aguments, this case is covered
>                via the SAGA version of grpc_call_argstack.
> 
> 
>     - grpc_call_async 
>     
>       GridRPC: makes a non-blocking remote procedure call with a
>                variable number of arguments.
>       SAGA:    done via task model and equivalent to grpc_call_argstack.
> 
> 
>     - grpc_call_argstack 
>     
>       GridRPC: makes a blocking call using the argument stack
>       SAGA:    call provides a parameter array of variable size
> 
>       
>     - grpc_call_argstack_async 
>     
>       GridRPC: makes a non-blocking call using the argument stack.
>       SAGA:    done via the task model and call
> 
> 
>     asynchronous control functions:
>     -------------------------------
> 
>     - grpc_probe 
>     
>       GridRPC: checks whether the asynchronous GridRPC call has
>                completed.
>       SAGA:    done via the task model
> 
> 
>     - grpc_cancel 
>     
>       GridRPC: cancels the specified asynchronous GridRPC call.
>       SAGA:    done via the task model
>        
> 
>     asynchronous wait functions:
>     ----------------------------
> 
>     - grpc_wait 
>     
>       GridRPC: blocks until the specified non-blocking requests to
>                complete.
>       SAGA:    done via the task model
>        
>     
>     - grpc_wait_and 
>     
>       GridRPC: blocks until all of the specified non- blocking requests
>                in a given set have completed.
>       SAGA:    done via the task container
> 
>     
>     - grpc_wait_or 
>     
>       GridRPC: blocks until any of the specified non- blocking requests
>                in a given set has completed.
>       SAGA:    done via the task container
>            
>     
>     - grpc_wait_all 
>     
>       GridRPC: blocks until all previously issued non-blocking requests
>                have completed.
>       SAGA:    done via the task container
>              
>     
>     - grpc_wait_any 
>     
>       GridRPC: blocks until any previously issued non-blocking request
>                has completed.
>       SAGA:    done via the task container
>       
> 
>     error reporting functions:
>     --------------------------
> 
>     - grpc_perror 
>     
>       GridRPC: prints the error string associated with the last GridRPC
>                call.
>       SAGA:    exceptions
>      
>      
>     - grpc_error_string 
>     
>       GridRPC: returns the error description string, given a numeric
>                error code.
>       SAGA:    exceptions
>      
>      
>     - grpc_get_error 
>     
>       GridRPC: returns the error code associated with a given
>                non-blocking request.
>       SAGA:    exceptions
>     
>      
>     - grpc_get_last_error 
>     
>       GridRPC: returns the error code for the last invoked GridRPC call.
>       SAGA:    exceptions
> 
>   
> +-------------------------------------------------------------+
> 
> #endif // SHORT


-- 
"So much time, so little to do..."  -- Garfield





More information about the gridrpc-wg mailing list