[ogsa-wg] effective use of resource lifetimes in grid infrastructure
Hiro Kishimoto
hiro.kishimoto at jp.fujitsu.com
Mon Dec 6 15:57:16 CST 2004
Steve's email bounced.
----
Hiro Kishimoto
Subject: Re: [ogsa-wg] effective use of resource lifetimes in grid
infrastructure
Cc: ogsa-wg at ggf.org
To: Steve Loughran <steve_loughran at hpl.hp.com>
From: Steve Tuecke <tuecke at mcs.anl.gov>
Date: Mon, 6 Dec 2004 09:52:17 -0600
Steve,
The ManagedJob WS-Resource need not be hosted on the same host as the
job itself.
Here's a draft document that you might find useful that describes the
GT4 WS GRAM approach in more detail:
http://www-unix.globus.org/toolkit/docs/development/3.9.3/execution/
wsgram/WS_GRAM_Approach.html
-Steve
On Dec 6, 2004, at 7:12 AM, Steve Loughran wrote:
> Ian Foster wrote:
>> Steve:
>> A variety of semantics and connections are possible between a
>> "WS-Resource" and an "entity that the WS-Resource repesents",
>> including both your (a) and (b) below. I don't believe that the
>> implied resource pattern implies that one particular approach be
>> adopted.
>> The following are some rough notes on how we have chosen to handle
>> things in the GT4 GRAM service. This may perhaps be relevant to your
>> problem.
>> The approach that we take in GT4 GRAM is as follows:
>> 1) A GRAM ManagedJobFactory defines a "create job" operation that:
>> a) creates a job, and also
>> b) creates a ManagedJob WS-Resource, which represents the resource
>> manager's view of the job.
>> 2) The ManagedJob WS-Resource and the job are then linked as follows:
>> a) Destroying the ManagedJob WS-Resource kills the job
>> b) State changes in the job are reflected in the ManagedJob
>> WS-Resource
>> c) Termination of the job also destroys the ManagedJob WS-Resource,
>> but not immediately: we find that you typically want to leave the
>> managedjob state around for "a while" after the job terminates to
>> allow clients to figure out what happened to the job after the fact
>> Regards -- Ian.
>
> Ian,
>
> What is your fault tolerance strategy here?
>
> Is every ManagedJob WS-Resource hosted on the same host (and perhaps,
> same process) as the job itself?
>
> This would mean that there is no way for the managedjob EPR to fail
> without the job itself failing, but would require the entire set of
> job hosts to be visible for inbound SOAP messages. And prevent you
> moving a job from one node to another without some difficultly (the
> classic CORBA object-moved problem, I believe, though HTTP 304
> responses would work if only SOAP stacks processed them reliably)
>
> I am trying to do a design which would enable (though would not
> require) only a subset of nodes -call them portal nodes- to be visible
> to outside callers, with the rest of the nodes only accessible to the
> portal itself. Once I assume this architecture, modelling the
> resources gets complex, as EPRs contain routing info that may become
> invalid if a portal node fails.
>
> -steve
>
>
>
More information about the ogsa-wg
mailing list