[gridrpc-wg] questions about function handles

Wed Jun 8 11:59:17 CDT 2005

Laurent,

Thank you for answering my question.
I think this is going to be a fruitful discussion.
Since we will not enough time at the GGF F2F meeting,
we should continue this on the ML.

> > It is totally implementation dependent, I believe.
> > In theory, you can get more chance to choose 'better' server,
> > if you delayed selection of the server to the actual invocation time,
> > since at that time you can get more information on the invocation,
> > such as the size of data to be transferred.
> Yes, you get more chance to choose a better server. But, that means 
> after calling the grpc_function_handle_default, you do not have a 
> reference to the server: it will be chosen later. You just have a 
> reference to the platform and you can not place any data before calling 
> the grpc_call since the server has not been selected. Please see the 
> example further.
> 
> >>  If the function handle contains a reference to a server, then the data
> >>  management can be done in the same way as for
> >>  grpc_function_handle_init. If the function handle does not reference
> >>  the computational server, there no way to know where to place data
> >>  before issuing grpc_call. This is the way function handles are
> >>  implemented in Diet and Netsolve (2.0, any changes ?)
> >>  GridRPC interfaces.
> >>
> >>However, we should provide a way to dynamically choose a server...
> >>Any comments ?
> > 
> > 
> > I cannot understand your concern. Could you explain it giving
> > examples ?
> Actually, the problem is to decide if we always know where the 
> computation will take place or not. If we always know it, then we can 
> use standard copy function to put the data on the server before 
> computing (then the client is able to manage its data on its own, it 
> does not need more platform support than data handle management). If we 
> do not know where the computation will take place then we need platform 
> support: we need way to say to the platform that we want to leave this 
> data inside it, somewhere. This way could be a persistency flag or a 
> bind function, it do not matter, but we need it. After computation, the 
> server needs to know what to do with the data: send it back to the 
> client or leave it on its host?
> 
> This example is taken from an application running under DIET. This 
> application (kmc) simulate atoms deposition on a substrate. To better 
> see the result of the simulation, the data computed by the simulation 
> program are sent to a ray tracing service (povray). To optmize the 
> performances, we plan to deploy both applications on to differents PC 
> clusters (1 and 2) managed by DIET.
> 
> The GRPC client will do :
> 
> grpc_initialize();
> grpc_function_handle_default( handleKmc, "kmc" );
> 
> // data preparation
> 
> grpc_call( handleKmc, data, &result );
> 
> // At the time of the grpc_call, the client does not know on which
> // cluster it will execute. However, this is not very important as we
> // just use input data for kmc.
> 
> grpc_function_handle_default( handlePovray, "povray" );
> grpc_call( handlePovray, result, &image );
> 
> When the client will call povray, it will not know where its image will 
> be computed, which povray server will be used, on cluster 1 or 2. If the 
> client get the result back, there is no problem because it will use 
> result in the call. But, if we want to avoid useless transfers of 
> result, we need to leave the result data (persitent) inside the platform 
> and transfert it if the povray computation is not done on the same 
> cluster, when this server will be chosen. In that case, we need a way to 
> indicate to the kmc server that the result data must be leave on the 
> server and not returned to the client. However, before calling 
> (grpc_call) the kmc application, we do not know which cluster will be 
> used, so there is no way to inform it. Its not possible to bind the data 
> to this server, we can just bind it to the platform.
> 
> Is that example more clear for you ?
Yes. thank you for taking time for this.

But I think this is essentially because of dynamic nature of 
GridRPC and by just defining the semantics of 
'grpc_function_handle_dafault' will not solve the problem.

Users will have freedom to dynamically create a povray handle
*AFTER* the kmc process is invoked. even if users explicitly 
specify a server for povray, there is no way to know it 
before calling kmc process.

So, I think we should admit this is a very complicated problem
and there is no simple answer. 
In my opinion there is two way to solve the problem.

- Assuming some 'magical' global data management system behind,
  define a simple interface.

- Assuming no background support,
  define a set of explicit data transfer method 
  and explicit data management 
   (may be with soft state lifetime managment).

I love the former one, because

A) there are several such 'magical' systems actually emerging,
   like AIST's gfarm,
B) data transfer method is already defined (or at least on the way
   its definition) in other WG and clearly out of scope of our WG.

comments?

-hidemoto