[ogsa-wg] RE: Modeling State: Technical Questions

Ian Foster foster at mcs.anl.gov
Wed Apr 6 12:15:35 CDT 2005


For what it's worth, the Globus user community has been running thousands 
of instances of our GRAM job submission service for quite a few years, with 
many many millions of jobs running through them, and as far as I am aware, 
no-one has ever asked for the ability to manage more than one job at a 
time. Certainly the lack of this facility hasn't seemed to stop anyone.

Lots of caveats can be applied here: maybe people did ask, and I didn't 
hear; maybe they didn't think to ask; maybe our workloads are special 
(although there is a great variety). But it is a data point.

Ian.




At 11:59 AM 4/6/2005 +0100, Mark McKeown wrote:

>Hi Paul,
>          Moving the question from can I suspend multiple
>jobs by sending a single message to a resource (either
>REST or WS-Resource) to weither this is a good thing.
>
>
>There is a balance between simplicity and efficiency -
>using a single message intoduces more complexities, as
>Steve Loughran illustrated, but is potentially more
>efficient than sending mutliple messages.
>
>
>Remembering that "Early optimisation is the root of all
>evil" (Knuth) - is adding support for suspending mutiple
>jobs using a single message an example of early
>optimisation?
>
>
>I would imagine that this should be a straight forward
>question since there is already considerable experience
>in using computational grids. Are users demanding the
>ability to suspend mutliple jobs using a single message?
>Is it for improved efficiency reasons? From my experience
>no, but others on this list will have considerably more
>experience.
>
>
>Could this be a case of "worse is better", simplicity
>is more important than efficiency?
>
>Perhaps there are other reasons for using a single message
>to interact with multiple jobs?
>
>cheers
>Mark
>
>
>
> > Ian,
> >
> >
> >
> > I agree that this is good progress. So let's bank that and see if we can
> > we can agree on one more thing, and then I'll ask a question.
> >
> >
> >
> > Considering your list of abilities (a, b & c) below, do we agree that in
> > terms of expressiveness, the ordering is:
> >
> >
> >
> > c>b>a
> >
> >
> >
> > i.e. using approach c, a client can request operations on:
> >
> >   a) single jobs: "where (jobid = urn:guid:364)"
> >
> >   b) sets of jobs: "where (jobid = urn:guid:364) or (jobid =
> > urn:guid:401)"
> >
> >
> >
> > If there is agreement on this, then we could move on to discussing why
> > it is felt necessary to provide more than just c for the job submission
> > service.
> >
> >
> >
> > Regards
> >
> > Paul
> >
> >
> >
> > Ian wrote...
> >
> > >Savas:
> >
> > >
> >
> > >It seems that we are in agreement, then, that we want the ability to:
> >
> > >
> >
> > >a) Request operations on individual jobs identified by some sort of
> > "jobid"
> >
> > >
> >
> > >b) Request operations on sets of jobs identified by a user-supplied
> > list of "jobids"
> >
> > >
> >
> > >c) Request operations on sets of jobs identified by more abstract
> > criteria
> >
> > >
> >
> > >We also agree that (as I expressed in the email that started this
> > discussion) such >requests can be expressed in a few different ways,
> > with somewhat different >characteristics.
> >
> > >
> >
> > >That's progress I hope.
> >
> > >
> >
> > >Ian.
> >
> >
> >
> > ________________________________
> >
> > From: Ian Foster [mailto:foster at mcs.anl.gov]
> > Sent: 05 April 2005 17:59
> > To: Savas Parastatidis; Steve Loughran
> > Cc: Mark McKeown; Karl Czajkowski; Dennis Gannon; Samuel Meder; ogsa-wg;
> > dave.pearson at oracle.com; gray at microsoft.com; humphrey at cs.virginia.edu;
> > grimshaw at virginia.edu; aherbert at microsoft.com; gcf at indiana.edu;
> > mark.linesch at hp.com; Frank Siebenlist; Tony Hey; Dave Berry; Paul Watson
> > Subject: RE: [ogsa-wg] RE: Modeling State: Technical Questions
> >
> >
> >
> > [I'm feeling increasingly bad about sending email to all of the people
> > CCed here, who may not be interested in these issues at all but got
> > addressed by Tony long ago...]
> >
> > Savas:
> >
> > It seems that we are in agreement, then, that we want the ability to:
> >
> > a) Request operations on individual jobs identified by some sort of
> > "jobid"
> >
> > b) Request operations on sets of jobs identified by a user-supplied list
> > of "jobids"
> >
> > c) Request operations on sets of jobs identified by more abstract
> > criteria
> >
> > We also agree that (as I expressed in the email that started this
> > discussion) such requests can be expressed in a few different ways, with
> > somewhat different characteristics.
> >
> > That's progress I hope.
> >
> > Ian.
> >
> > At 02:44 PM 4/5/2005 +0100, Savas Parastatidis wrote:
> >
> >
> >
> >
> > Dear Ian,
> >
> >
> >
> > I dont think that the approach I proposed forces the user to do more
> > than they would have to do anyway if EPRs were used. It is still the
> > case that someone has to manage the EPRs to the resources in WSRF. This
> > is similar to what happens in the real world. The online bookstore will
> > ask for my credit card number (a URI), or the book store will as for an
> > ISBN (another URI) or multiple ISBNs if I want to buy multiple books.
> > The banking service will ask for my bank account number (another URI
> > perhaps).
> >
> >
> >
> > Also, there is no reason why a kill all my jobsmessage couldnt also be
> > supported. But please note that this message is now addressed to the
> > service (the container of resources) and not, as in the case of WSRF, to
> > a specific resource. This is no different from what I am advocating.
> >
> >
> >
> > Also& to Steves point about partial failure. If one wishes atomic
> > transaction semantics, I dont see the difference from the two
> > approaches&
> >
> >
> >
> > Atomic
> >
> >   Msg -> resource 1
> >
> >   Msg -> resource 2
> >
> >   Msg -> resource 3
> >
> > End Atomic
> >
> >
> >
> > Vs
> >
> >
> >
> > Msg
> >
> >   Atomic
> >
> >     Resource 1
> >
> >     Resource 2
> >
> >     Resource 3
> >
> >   End Atomic
> >
> >
> >
> > In fact, I would argue that the latter is better because:
> >
> >
> >
> > 1. It uses fewer messages (and, Steve, I am not assuming only HTTP and
> > the optimisations that may be supported)
> >
> >
> >
> > 2.  I can more easily deal with the failures in an application
> > specific-manner since my atomic TX semantics do not span multiple msgs.
> >
> >
> >
> > (Anyway& who wants to do atomic TXs over the Web anyway? :-)
> >
> >
> >
> > Regards,
> >
> > --
> > Savas Parastatidis
> > http://savas.parastatidis.name
> >
> >
> >
> >
> > From: Ian Foster [mailto:foster at mcs.anl.gov]
> > Sent: Tuesday, April 05, 2005 2:22 PM
> > To: Steve Loughran; Savas Parastatidis
> > Cc: Mark McKeown; Karl Czajkowski; Dennis Gannon; Samuel Meder; ogsa-wg;
> > dave.pearson at oracle.com; gray at microsoft.com; humphrey at cs.virginia.edu;
> > grimshaw at virginia.edu; aherbert at microsoft.com; gcf at indiana.edu;
> > mark.linesch at hp.com; Frank Siebenlist; Tony Hey; Dave Berry
> > Subject: Re: [ogsa-wg] RE: Modeling State: Technical Questions
> >
> >
> >
> > Steve's note raises a key point for me: do we really want to force the
> > user (as Savas seems to be advocating) to keep track of jobs running at
> > a remote site?
> >
> > I'd rather send a request "kill all my jobs" or "kill all my jobs that
> > have run for more than a day" to the factory than carefully keep track
> > of all jobs that I have active, and how long they have been running, so
> > that I can send the big document (or stream) discussed below.
> >
> > Ian.
> >
> >
> > At 02:10 PM 4/5/2005 +0100, Steve Loughran wrote:
> >
> > Savas Parastatidis wrote:
> >
> > Dear all,
> > I think something needs to be clarified with regards to handling
> > multiple jobs with one message. The beauty of document-oriented
> > interactions is that you can do things like...
> > <job-details-request>
> >   <job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id>
> >   <job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id>
> >   <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id>
> >   <job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id>
> > </job-details-request>
> > Or
> > <job-suspend-request>
> >   <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id>
> >   <job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id>
> >   <job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id>
> > </job-suspend-request>
> > The schema for the above document can allow anything from 0 to N number
> > of <job-id> elements.
> >
> >
> > the trouble with any bulk operation is you have to handle partial
> > failure. You need either atomic operations (not long lived transactions
> > over HTTP Savas, I wouldn't be that daft), or a way of indicating that
> > only a bit went wrong
> >
> > Hence the 207 Multi-Status response in WebDav, the "something failed,
> > look in the message". WebDav is still single instance (here a RESTy
> > URL), but you can set >1 property and so have partial failure.
> >
> > SOAP just has SOAPFault and extensions; no explicit multiple failure
> > response. WS-RF-ResourceProperties has a similar problem with
> > SetResourceProperties, but a different failure model in which any
> > failure to set can result in a WS-BaseFault, indicating which failed,
> > but providing no apparent information on which worked.
> >
> > It seems to me that if you want to bulk stuff, you do need ways of (a)
> > handling partial failure and (b) declaring what happens on partial
> > failure. For the curions, WebDav's failure mode on file operations
> > (MOVE, COPY) is explicitly declared to be that of failed file operations
> > of Win98 on a FAT32 filesystem  [1,2]
> >
> > Alternatively, you dont go for bulk operations, neither on a multiple
> > jobs, or on multiple properties of a job (remember, WS-RF doesn't
> > declare atomic/transacted property operations, so all you do here is
> > increase the window of instability, a window that already exists).
> > Instead you just stream a series of operations over the same HTTP1.1
> > connection -assuming that everything is accessible at the same far-end
> > host, and get a series of (potentially out of order, we are talking
> > HTTP1.1) responses.
> >
> > This could be efficient, and you could do better handling of failure.
> > But you do need a SOAP stack that can keep an HTTP1.1 channel open for
> > multiple requests. Axis doesnt, even if you get httpclient to do the
> > HTTP work; I don't know about .NET/WSE. You also need developers to
> > model the communication correctly. Manipulating JAXRPC proxies as if
> > they represent remote objects is *clearly* the wrong way to do it. You'd
> > almost want to model a queue of requests waiting to be POSTed, a queue
> > you can fill up then push out. Something like this, in your Java-era
> > language of choice :-
> >
> > //different queues for SOAP, REST
> > Queue q=new Soap12RequestQueue();
> >
> > q.add(new StatePut(job1.uri,Job.LIFECYCLE,Job.SUSPENDED));
> > //let the queue reorder stuff if it wants to
> > q.add(new
> > StatePut(job2.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_OPTIMAL);
> > q.add(new
> > StatePut(job3.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_LAST);
> >
> > q.setEventHandler(this);
> > q.nonBlockingSubmit();
> >
> > No, there is no code behind this example, and I am avoiding any hints as
> > to what the even handler would look like. I think the key point is that
> > once you embrace remote operations as async actions, then you can model
> > the manipulations differently.  Note also that I am representing job
> > suspension not as an explicit suspend() operation, but as a request to
> > put a job into the suspended state. This API could work with our friend
> > REST just as easily as with WS-RF...
> >
> > Anyway Savas, to conclude: do you have any evidence that a single
> > document is suboptimal compared to a sequences of requests over an open
> > HTTP/1.1 connection? That is, assuming we ignore the SHOULD in the
> > HTTP1.1 specification " Clients SHOULD NOT pipeline requests using
> > non-idempotent methods or non-idempotent sequences of methods" [3]
> >
> > -Steve
> >
> >
> > [1] WebDav http://www.ietf.org/rfc/rfc2518.txt S8.9.2
> >
> > "after encountering an error moving a non-collection
> >    resource as part of an infinite depth move, the server SHOULD try to
> >    finish as much of the original move operation as possible."
> >
> > [2]
> > http://lists.w3.org/Archives/Public/w3c-dist-auth/1997JulSep/0177.html
> >
> > [3] RFC2616 HTTP1.1
> >
> > _______________________________________________________________
> > Ian Foster                    www.mcs.anl.gov/~foster
> > Math & Computer Science Div.  Dept of Computer Science
> > Argonne National Laboratory   The University of Chicago
> > Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
> > Tel: 630 252 4619             Fax: 630 252 1997
> >         Globus Alliance, www.globus.org <http://www.globus.org/>
> >
> > _______________________________________________________________
> > Ian Foster                    www.mcs.anl.gov/~foster
> > Math & Computer Science Div.  Dept of Computer Science
> > Argonne National Laboratory   The University of Chicago
> > Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
> > Tel: 630 252 4619             Fax: 630 252 1997
> >         Globus Alliance, www.globus.org <http://www.globus.org/>
> >
> >

_______________________________________________________________
Ian Foster                    www.mcs.anl.gov/~foster
Math & Computer Science Div.  Dept of Computer Science
Argonne National Laboratory   The University of Chicago
Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
Tel: 630 252 4619             Fax: 630 252 1997
         Globus Alliance, www.globus.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/ogsa-wg/attachments/20050406/96671907/attachment.htm 


More information about the ogsa-wg mailing list