[ogsa-bes-wg] Updates to document

Wed May 24 11:59:09 CDT 2006

Peter, et al,
I'me time constrained right now, but let me give a quick response.

> -----Original Message-----
> From: Peter G Lane [mailto:lane at mcs.anl.gov]
> Sent: Monday, May 22, 2006 6:32 PM
> To: Andrew Grimshaw
> Cc: ogsa-bes-wg at ggf.org
> Subject: Re: [ogsa-bes-wg] Updates to document
> 
> Some questions/comments...
> 
> I'm concerned about this term "container" as used in the document. This
> is very confusing as this term is already used to refer to the web
> service hosting infrastructure (i.e. Tomcat, GT's "standalone
> container", or figure 1's "service container").
[Andrew Grimshaw] 

We've been using this term for some time - and indeed if you look at the
larger EMS picture the "services" started may in fact be stateful web
services. The sense is that the "service" - or legacy activity in the case
of BES, is logically "contained" by its execution environment.

> 
> In section 2 under the "Naming" bullet. What exactly is being named?
> 
> Is the last bullet under section 2 saying that a BES-compliant service
> cannot support logical file names? Does this include converting file
> paths to, say, GridFTP URLs that might differ in the file path with
> respect to the GridFTP server? Does this mean that variables cannot be
> used in file paths?
> 
[Andrew Grimshaw] 
There is a huge history on naming. Section 2 is referring to the activities
that are being started - that their "names", "handles", "addresses", or
whatever you want to call it will be EPR's 
We are referring to the "type" so to speak of the endpoints, some may be
WS-Names, some may be renewable references, some may be managed jobs - the
spec is silent on that, but recognizes that there are many choices.

> I don't know much about WS-Names (anybody have a pointer to documents?),
> but it seems problematic for interoperability that in section 3 the
> method of exposing metadata about the AED type used is deliberately left
> up to the implementation. This makes it impossible to automatically find
> this information without either a defacto standard or some mapping of
> implementation to metadata discovery method. Why not at least specify
> how this metadata is discovered? A standard resource property would do
> it.
[Andrew Grimshaw] 
The WS-Names documents can be found in grid forge off of the OGSA-Naming WG.
The issue about how can you find meta data is crucial, but may be different
in different renderings of the standard, e.g., WSRF versus WS-I, versus who
knows what. This has been a massive political issue, and the result is a
compromise. For example, in WSRF they are resource properties.

> 
> I have issues with the hardware and software resource properties. Do
> these describe the capabilities of the machine the service is running on
> or a cluster/SMP machine sitting behind the service? If the former, this
> is only useful if the activity runs on the same machine as service and
> seems like it should be optional. Also, what about the metadata
> describing the cluster or SMP machine? If the latter, then it is
> limiting in that a cluster may have machines with different OS, CPU
> configuration, available memory, architecture, etc...
> 
[Andrew Grimshaw] 
The resource that the BES container is representing - not the machine it is
on. Recall that a BES container may "front" a whole set of other containers.
I agree with your comment about "limiting" if there are hetero resources
behind. The resource model at this point in time does not address multiple
entries for any given field.

> In section 5.1.1 I find the IsAcceptingNewActivities RP rather useless.
> The client will find out just as easily if the service is accepting new
> activties depending on whether a NotAcceptingNewActivities fault is
> thrown or not. Checking the RP first both creates an extra WS operation
> call and cannot be guaranteed since there is a potential race condition
> between when the RP value is returned and the submission call is sent.
> If there was some negotiation protocol incorporated into BES I might buy
> this, but otherwise I don't see the point.
[Andrew Grimshaw] 
There are always race conditions in DS's. What if I want to discover whether
you are taking activities - but not necessarily start an activity, for
example to construct a candidate set of BES containers?

> 
> In section 6.1, I feel it's superfluous to suffix the operation name
> with "FromJSDL". Are there plans to support something other than JSDL?
> If not, why not just make this "CreateActivity"?
>
[Andrew Grimshaw] 
BES is the simplest, most basic, "type" of a service container in the EMS
architecture. Rather than assume that for all time only JSDL will be used
seemed overly restrictive. In BES there are no plans for anything else - but
...

> Section 6.1.1: "Document" suffix seems also superfluous. Of course it's
> a document. What else would it be? Also, I don't see why
> "createInSuspendedState" needs to be outside the JSDL document. This
> should be an extension to JSDL especially considering that the state
> model doesn't have a base "suspended" state. So if a service doesn't
> support suspended state extensions, this flag is meaningless.
>
[Andrew Grimshaw] 
I'll look at this one later - 

> Section 6.1.2: absolutely unique idempotence IDs are impossible to
> guarantee. I'm of the opinion that these should be valid only for the
> life of a job. Besides being ridiculous to assume that the service
> should keep track of these IDs to make sure it is never ever used again,
> doing so serves no real purpose.
[Andrew Grimshaw] 
I happen to agree with you - certainly for all time is difficult, and
depending on exactly what semantics you think you're getting, impossible.
However, this is in there as a result insistence and a long discussion on
input (ESI document) from the Globus and Unicore teams.

> 
> Section 6.1.3: If there is an optional subscription request element in
> the input, then there needs to be an optional subscription reference
> element either in the output (preferred to avoid an RP query round trip)
> or a resource property.
[Andrew Grimshaw] 
I'm not sure I follow - you mean you want a handle to the subscription in
the return?

> 
> 6.1.4: The names of the fault types isn't consistent. Either suffix with
> "Fault" or don't.
[Andrew Grimshaw] Yes, should be fiexed

Sorry got to go now. Will get to rest later.

> 
> Spelling typo in section 6.2, page 19, 1st paragraph: "proceeding" ->
> "preceding".
> 
> 6.2: I would prefer that the "*Status" elements not have "state" and
> "laststate" attributes but instead have two child elements "state" and
> "lastState" (or something similarly named). These elements would in turn
> have a child element that has a well defined (i.e. in this document)
> enumerated state representing one of the states in the general state
> diagram (i.e. New, Pending, Running, Canceled, Failed, or Finished) and
> one optional xsd:anyType element (or just an xsd:any) that specifies the
> state extension if one exists (i.e. StageIn, Suspended, etc...). At very
> least something needs to be changed to account for the new state model.
> 
> 6.3: I was liking the new state model until I saw this operation. I
> don't like this idea at all of allowing the user to monkey with the
> state machine directly. I would be much happier if there were specific
> operations to inject specific inputs that the state machine should
> consider. The state machine should never be open to direct manipulation,
> but instead should decided for itself what state and when to transition
> to that state based on the current state and events. At *very* least
> this operation should have a fault that means "too bad, I'm not doing
> that". If the desire is to allow for, say, reseting of the job back to
> New so that it will run again or canceling without destroying the job,
> then specific operations can be added to the spec as is. Since
> sub-states like "suspended" are treated as extensions now, a
> "SuspendActivities" operation would seem more appropriate as an
> extension as well.
> 
> 6.4 and 6.5: Could someone provide a use case for wanting to start and
> stop the acceptance of new activities via the WS interface.
> 
> 6.6: This should be a resource property not a new operation.
> 
> I'm already commenting past page 20, so I'll stop here. One last
> observation, though. I noticed that this document makes no attempt at
> defining an activity resource. Is this intentionally out of scope?
> 
> Peter
> 
> On Mon, 2006-05-22 at 14:58 -0400, Andrew Grimshaw wrote:
> > All,
> >
> > I have done the updates to the document discussed in the last two
> > phone calls, and in the EMS meeting in Japan.
> >
> > At a high level this includes modifications to adopt text/issues from
> > the ESI document and the HPC profile working document. Note that this
> > is NOT a final document.
> >
> >
> >
> > These changes revolve around:
> >
> > 1: Changing CreateActivityFromJSDL to
> >
> > a)    take additional optional arguments, specifically notification
> > arguments and an "idempotent" argument lifted from ESI.
> >
> > b)    An extension to JSDL suggested in ESI to support libraries
> >
> > 2: Adding a resource information model section. Originally to be from
> > 4.1 of the ESI document - but post discussion with Snelling, Stokes,
> > et al a modified 4.1.
> >
> > 3: State model. This is one of the biggest jobs. At GGF in Japan the
> > HPC profile group introduced a nice, simple, extensible state model
> > that would allow supporting a variety of state machines while still
> > allowing basic clients to understand what they were getting. After
> > discussions with many people I have decided to adopt that simpler
> > state model. CLEARLY THIS WILL NEED DISCUSSION BY INTERESTED PARTIES.
> > I AM NOT TRYING TO PULL A FAST ONE.  I have taken the text verbatim
> > from the HPC profile paper. Note that I do not yet have permission
> > from their lawyers - so we may need to yank this and retype the same
> > info. I hope not.
> >
> > 4: Refer to managedjob "type" of EPR, just as we refer to "WS-Name"
> > "type" of EPR.
> >
> >
> >
> > I have not even tried to modify the WSDL and renderings. Once we
> > decide on the rest of the content we can do that. So don't read past
> > page 20!
> >
> >
> >
> > I am not sure of the status of the weekly call this week.  I will
> > synch with Darren and Steven and send mail tomorrow.
> >
> >
> >
> > I'm also including some slides from the discussion of BES in the EMS
> > session.
> >
> >
> >
> > A
> >
> >
> >
> > Andrew Grimshaw
> >
> > Professor of Computer Science
> >
> > University of Virginia
> >
> > 434-982-2204
> >
> > grimshaw at cs.virginia.edu
> >
> >
> >
> >