[gridrpc-wg] questions about function handles

Tue Jun 7 11:17:19 CDT 2005

Here is a short response to Hidemoto's questions :

Hidemoto Nakada wrote:

>>There two ways to initialize a function handle.
>>
>>- first, we can do it with grpc_function_handle_init. In this case, the
>>  server is explicitly given by the client. As data location is known
>>  (client, server or data repository), there no problem to bind the data
>>  to the server. All data management can be done by the client:
>>  placement, transfers or removal. The location will be given in the data
>>  management functions.
>>
>>- second, the function handle could be initialized by
>>  grpc_function_handle_default. In that case, the GridRPC API document
>>  says: "This default could be a pre-determined server or it could be a
>>  server that is dynamically chosen by the resource discovery mechanisms
>>  of the underlying GridRPC implementation". Does that mean, the function
>>  handle will contain a server reference in it after
>>  grpc_function_handle_default call ? Or does that mean, the function
>>  handle will reference a default discovery (or GridRPC server) while the
>>  computational server will be chosen during grpc_call ?
> 
> 
> It is totally implementation dependent, I believe.
> In theory, you can get more chance to choose 'better' server,
> if you delayed selection of the server to the actual invocation time,
> since at that time you can get more information on the invocation,
> such as the size of data to be transferred.
Yes, you get more chance to choose a better server. But, that means 
after calling the grpc_function_handle_default, you do not have a 
reference to the server: it will be chosen later. You just have a 
reference to the platform and you can not place any data before calling 
the grpc_call since the server has not been selected. Please see the 
example further.

>>  If the function handle contains a reference to a server, then the data
>>  management can be done in the same way as for
>>  grpc_function_handle_init. If the function handle does not reference
>>  the computational server, there no way to know where to place data
>>  before issuing grpc_call. This is the way function handles are
>>  implemented in Diet and Netsolve (2.0, any changes ?)
>>  GridRPC interfaces.
>>
>>However, we should provide a way to dynamically choose a server...
>>Any comments ?
> 
> 
> I cannot understand your concern. Could you explain it giving
> examples ?
Actually, the problem is to decide if we always know where the 
computation will take place or not. If we always know it, then we can 
use standard copy function to put the data on the server before 
computing (then the client is able to manage its data on its own, it 
does not need more platform support than data handle management). If we 
do not know where the computation will take place then we need platform 
support: we need way to say to the platform that we want to leave this 
data inside it, somewhere. This way could be a persistency flag or a 
bind function, it do not matter, but we need it. After computation, the 
server needs to know what to do with the data: send it back to the 
client or leave it on its host?

This example is taken from an application running under DIET. This 
application (kmc) simulate atoms deposition on a substrate. To better 
see the result of the simulation, the data computed by the simulation 
program are sent to a ray tracing service (povray). To optmize the 
performances, we plan to deploy both applications on to differents PC 
clusters (1 and 2) managed by DIET.

The GRPC client will do :

grpc_initialize();
grpc_function_handle_default( handleKmc, "kmc" );

// data preparation

grpc_call( handleKmc, data, &result );

// At the time of the grpc_call, the client does not know on which
// cluster it will execute. However, this is not very important as we
// just use input data for kmc.

grpc_function_handle_default( handlePovray, "povray" );
grpc_call( handlePovray, result, &image );

When the client will call povray, it will not know where its image will 
be computed, which povray server will be used, on cluster 1 or 2. If the 
client get the result back, there is no problem because it will use 
result in the call. But, if we want to avoid useless transfers of 
result, we need to leave the result data (persitent) inside the platform 
and transfert it if the povray computation is not done on the same 
cluster, when this server will be chosen. In that case, we need a way to 
indicate to the kmc server that the result data must be leave on the 
server and not returned to the client. However, before calling 
(grpc_call) the kmc application, we do not know which cluster will be 
used, so there is no way to inform it. Its not possible to bind the data 
to this server, we can just bind it to the platform.

Is that example more clear for you ?

   Laurent

-- 
Laurent PHILIPPE  http://lifc.univ-fcomte.fr/~philippe
philippe at lifc.univ-fcomte.fr               Laboratoire d'Informatique (LIFC)
tel: (33) 03 81 66 66 54                                     route de Gray
fax: (33) 03 81 66 64 50                     25030 Besancon Cedex - FRANCE