[ogsa-wg] effective use of resource lifetimes in grid infrastructure

Mon Dec 6 15:57:16 CST 2004

Steve's email bounced.
----
Hiro Kishimoto

Subject: Re: [ogsa-wg] effective use of resource lifetimes in grid
infrastructure
Cc: ogsa-wg at ggf.org
To: Steve Loughran <steve_loughran at hpl.hp.com>
From: Steve Tuecke <tuecke at mcs.anl.gov>
Date: Mon, 6 Dec 2004 09:52:17 -0600

Steve,

The ManagedJob WS-Resource need not be hosted on the same host as the  
job itself.

Here's a draft document that you might find useful that describes the  
GT4 WS GRAM approach in more detail:

http://www-unix.globus.org/toolkit/docs/development/3.9.3/execution/ 
wsgram/WS_GRAM_Approach.html

-Steve

On Dec 6, 2004, at 7:12 AM, Steve Loughran wrote:

> Ian Foster wrote:
>>   Steve:
>> A variety of semantics and connections are possible between a
>> "WS-Resource" and an "entity that the WS-Resource repesents",  
>> including both your (a) and (b) below. I don't believe that the  
>> implied resource pattern implies that one particular approach be  
>> adopted.
>> The following are some rough notes on how we have chosen to handle  
>> things in the GT4 GRAM service. This may perhaps be relevant to your  
>> problem.
>> The approach that we take in GT4 GRAM is as follows:
>> 1) A GRAM ManagedJobFactory defines a "create job" operation that:
>> a) creates a job, and also
>> b) creates a ManagedJob WS-Resource, which represents the resource  
>> manager's view of the job.
>> 2) The ManagedJob WS-Resource and the job are then linked as follows:
>> a) Destroying the ManagedJob WS-Resource kills the job
>> b) State changes in the job are reflected in the ManagedJob  
>> WS-Resource
>> c) Termination of the job also destroys the ManagedJob WS-Resource,  
>> but not immediately: we find that you typically want to leave the  
>> managedjob state around for "a while" after the job terminates to  
>> allow clients to figure out what happened to the job after the fact
>> Regards -- Ian.
>
> Ian,
>
> What is your fault tolerance strategy here?
>
> Is every ManagedJob WS-Resource hosted on the same host (and perhaps,
> same process) as the job itself?
>
> This would mean that there is no way for the managedjob EPR to fail
> without the job itself failing, but would require the entire set of  
> job hosts to be visible for inbound SOAP messages. And prevent you  
> moving a job from one node to another without some difficultly (the  
> classic CORBA object-moved problem, I believe, though HTTP 304  
> responses would work if only SOAP stacks processed them reliably)
>
> I am trying to do a design which would enable (though would not
> require) only a subset of nodes -call them portal nodes- to be visible  
> to outside callers, with the rest of the nodes only accessible to the  
> portal itself. Once I assume this architecture, modelling the  
> resources gets complex, as EPRs contain routing info that may become  
> invalid if a portal node fails.
>
> -steve
>
>
>