[ogsa-wg] Thoughts on extensions mechanisms for the HPC profile work

Wed May 3 19:17:49 CDT 2006

Hi;

Service groups are a specific way of creating stateful aggregations.  If
you want to create such stateful aggregations they are one way of doing
so.  Aside from my point about not wanting to create any sort of
stateful aggregation (with all its attendant life cycle management
costs) in some cases, I don't know whether they would be appropriate for
some of the more complicated aggregations one might want to create.  For
example, I don't know whether they would be the most suitable way of
creating dynamic aggregates, such as workflows with varying numbers of
constituent jobs and hierarchies of constituent jobs.

Marvin.

-----Original Message-----
From: Michael Behrens [mailto:behrens at r2ad.com] 
Sent: Tuesday, May 02, 2006 6:40 PM
To: Marvin Theimer
Cc: Dave Berry; ogsa-wg at ggf.org
Subject: Re: [ogsa-wg] Thoughts on extensions mechanisms for the HPC
profile work

Service Groups come to mind when aggregations are mentioned....would 
they apply here?

Marvin Theimer wrote:

>Hi;
>
> 
>
>I certainly agree that you will want to be able to define more complex
>expressions than just lists of identifiers and that you will want to be
>able to create various kinds of aggregations - including workflows -
>that persist beyond a single client-service interaction.  The key point
>I was after was to observe that there are situations where a client
will
>want to specify multiple entities in a request and that therefore we
>need a way of accommodating such requests.  (Note that you sometimes
>might want to name multiple aggregate entities as well - e.g. to cancel
>all the workflows that you've currently got running.)  So the main
thing
>I want to avoid is ending up with a purely "object-oriented" design in
>which all stateful entities are accessed strictly via separate messages
>sent only to their associated EPRs is not sufficient.
>
> 
>
>Another thing to keep in mind is that not all interactions with
>logically aggregated entities require a persistent connection.  Suppose
>I want to periodically query the status of a particular set of jobs.
If
>I periodically send across a list of job IDs then the scheduler will
>assemble the current status information for each listed job and return
>that information to me.  If I have to first ask the scheduler to create
>a stateful object representing that list of jobs, so that I can then
>send query messages to the list object, then the scheduler has to do
>substantially more work for no real benefit.  In particular, it still
>has to assemble and return the current state information for each job
in
>the list for each query.  But now it also has to maintain a stateful
>list object as well, including all the state and management duties that
>are associated with that.  Worse yet, you now get to deal with all the
>failure modes that a stateful connection can exhibit in the face of
>client and server crashes, as well as network partitions.  So, whereas
I
>agree that creating stateful representations of more complex entities,
>such as workflows, is definitely a case to be supported, I would
>disagree with the notion that every aggregate collection - no matter
how
>ephemeral - should be dealt with by means of reified aggregation
>objects.
>
> 
>
>Moving on, you are right that an abstract (opaque) name is not very
>useful without knowing which service(s) can understand its meaning.
But
>that's not really what I'm proposing: When you get back an abstract
name
>from a service the understanding is that you can use the name when
>interacting with that service (or other services that you explicitly
>know will understand it).  You are expected to remember that binding.
>
> 
>
>Personally I like WS-names for this - and other - reasons.  Because the
>abstract name - the true name, if you will, rather than a potentially
>ephemeral address for where to send messages to - is explicit you can
do
>all kinds of useful things, including extracting an efficient
>representation of a bunch of them.
>
> 
>
>I'm not sure what you mean with your suggestion of creating a single
>WS-Name that can be used to query a bunch of different EPRs' entities.
>The benefits I'm talking about come from not involving the server in
any
>explicit aggregation "creation" operations.  If you mean to make the
>notion of extracting abstract names from a bunch of WS-names and then
>using them in an array operation that is sent to the same EPR, then I'm
>all for it.
>
> 
>
>Marvin.
>
> 
>
>________________________________
>
>From: owner-ogsa-wg at ggf.org [mailto:owner-ogsa-wg at ggf.org] On Behalf Of
>Dave Berry
>Sent: Saturday, April 29, 2006 8:13 AM
>To: Marvin Theimer; ogsa-wg at ggf.org
>Subject: RE: [ogsa-wg] Thoughts on extensions mechanisms for the HPC
>profile work
>
> 
>
>Hi Marvin,
>
> 
>
>It seems strange to me to limit yourself to a list of identifiers as a
>means of interacting with large numbers of jobs.  Wouldn't it be more
>flexible to allow more complex expressions, such as "all jobs submitted
>by Marvin that use more than 4 processors and are currently running"
(or
>whatever).  You could build an appropriate information model, map it to
>a data model and provide a corresponding query language that clients
>could use to request the information they need.
>
>
>I agree that this is a service-oriented approach rather than a
>resource-oriented approach.  However, having set up a particular query,
>you might want to repeatedly interact with the current state of that
>query.  So it might make sense to register the query with the service
>and return an EPR that can be used to get the current set of results.
>
> 
>
>I don't see much point in passing back an abstract name (e.g. a UUID)
on
>its own.  Without a reference to a resolution mechanism, clients won't
>be able to make much use of it.  This is why URL's and URI's have been
>so successful; they include enough information for any client to know
>which mechanism to use to resolve them.  This seems to be one advantage
>of the WS-Name proposal; it includes an abstract name with a reference
>to a resolution mechanism. 
>
> 
>
>Is it still the case that BES containers are allowed (or even
>encouraged) to return WS-Names?
>
> 
>
>Would it be useful to have a means for composing WS-Name EPRs that use
>the same resolution mechanism in order to make a single WS-Name that
can
>be used to query all of them?
>
> 
>
>Dave.
>
> 
>
> 
>
> 
>
> 
>
>-----Original Message-----
>From: owner-ogsa-wg at ggf.org [mailto:owner-ogsa-wg at ggf.org] On Behalf Of
>Marvin Theimer
>Sent: 29 April 2006 03:06
>To: ogsa-wg at ggf.org
>Subject: [ogsa-wg] Thoughts on extensions mechanisms for the HPC
profile
>work
>
>	1.      *        Support for array operations and other forms of
>batching.
>
>	2.      *        When 1000's of jobs are involved the efficiency
>gains of employing array operations for things like queries or abort
>requests are too significant to ignore.  Hence a model in which every
>job must be interacted with on a strictly individual basis via an EPR
is
>arguably unacceptable.
>
>	3.      *        One approach would be to simply add array
>operations alongside the corresponding individual operations, so that
>one can selectively interact with jobs (as well as things like data
>files) in either an "object-oriented" fashion or in "bulk-array"
>fashion.
>
>	One could observe that the array operations enable the
>corresponding individual operations as a trivial special case, but this
>would arguably violate the principle of defining a minimalist base case
>and then employing only extensions (rather than replacements).
>
>	4.      *        Array operations are an example or a
>service-oriented rather than a resource-oriented form of interaction:
>clients send a single request to a job scheduler (service) that refers
>to an array of many resources, such as jobs.  This raises the question
>of whether things like jobs should be referred to via EPRs or via
unique
>"abstract" names that are independent of any given service's contact
>address.  At a high level, the choice is unimportant since the client
>submitting an array operation request is simply using either one as a
>unique (and opaque) identifier for the relevant resource.  On a
>pragmatic level one might argue that abstract names are easier and more
>efficient to deal with than EPRs since the receiving scheduler will
need
>to parse EPRs to extract what is essentially the abstract name for each
>resource.  (Using arrays of abstract names rather than arrays of EPRs
is
>also more efficient from a size point-of-view.)
>
>	5.      *        If abstract names are used in array operations
>then it will necessary that individual operations return the abstract
>name and not just an EPR for a given resource, such as a job.  If this
>approach is chosen then this implies that the base case design and
>implementation must return abstract names and not just EPRs for things
>like jobs.
>
>
>  
>

-- 
Michael Behrens
R2AD, LLC
(571) 594-3008 (cell)
(703) 714-0442 (land)