[ogsa-wg] RE: Modeling State: Technical Questions

Ian Foster foster at mcs.anl.gov
Tue Apr 5 08:21:48 CDT 2005


Steve's note raises a key point for me: do we really want to force the user 
(as Savas seems to be advocating) to keep track of jobs running at a remote 
site?

I'd rather send a request "kill all my jobs" or "kill all my jobs that have 
run for more than a day" to the factory than carefully keep track of all 
jobs that I have active, and how long they have been running, so that I can 
send the big document (or stream) discussed below.

Ian.


At 02:10 PM 4/5/2005 +0100, Steve Loughran wrote:
>Savas Parastatidis wrote:
>>Dear all,
>>I think something needs to be clarified with regards to handling
>>multiple jobs with one message. The beauty of document-oriented
>>interactions is that you can do things like...
>><job-details-request>
>>   <job-id>urn:ogsa:job:guid:bla-bla-bla-001</job-id>
>>   <job-id>urn:ogsa:job:guid:bla-bla-bla-010</job-id>
>>   <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id>
>>   <job-id>urn:ogsa:job:guid:bla-bla-bla-029</job-id>
>></job-details-request>
>>Or
>><job-suspend-request>
>>   <job-id>urn:ogsa:job:guid:bla-bla-bla-002</job-id>
>>   <job-id>urn:ogsa:job:guid:bla-bla-bla-005</job-id>
>>   <job-id>urn:ogsa:job:guid:bla-bla-bla-008</job-id>
>></job-suspend-request>
>>The schema for the above document can allow anything from 0 to N number
>>of <job-id> elements.
>
>the trouble with any bulk operation is you have to handle partial failure. 
>You need either atomic operations (not long lived transactions over HTTP 
>Savas, I wouldn't be that daft), or a way of indicating that only a bit 
>went wrong
>
>Hence the 207 Multi-Status response in WebDav, the "something failed, look 
>in the message". WebDav is still single instance (here a RESTy URL), but 
>you can set >1 property and so have partial failure.
>
>SOAP just has SOAPFault and extensions; no explicit multiple failure 
>response. WS-RF-ResourceProperties has a similar problem with 
>SetResourceProperties, but a different failure model in which any failure 
>to set can result in a WS-BaseFault, indicating which failed, but 
>providing no apparent information on which worked.
>
>It seems to me that if you want to bulk stuff, you do need ways of (a) 
>handling partial failure and (b) declaring what happens on partial 
>failure. For the curions, WebDav's failure mode on file operations (MOVE, 
>COPY) is explicitly declared to be that of failed file operations of Win98 
>on a FAT32 filesystem  [1,2]
>
>Alternatively, you dont go for bulk operations, neither on a multiple 
>jobs, or on multiple properties of a job (remember, WS-RF doesn't declare 
>atomic/transacted property operations, so all you do here is increase the 
>window of instability, a window that already exists). Instead you just 
>stream a series of operations over the same HTTP1.1 connection -assuming 
>that everything is accessible at the same far-end host, and get a series 
>of (potentially out of order, we are talking HTTP1.1) responses.
>
>This could be efficient, and you could do better handling of failure. But 
>you do need a SOAP stack that can keep an HTTP1.1 channel open for 
>multiple requests. Axis doesnt, even if you get httpclient to do the HTTP 
>work; I don't know about .NET/WSE. You also need developers to model the 
>communication correctly. Manipulating JAXRPC proxies as if they represent 
>remote objects is *clearly* the wrong way to do it. You'd almost want to 
>model a queue of requests waiting to be POSTed, a queue you can fill up 
>then push out. Something like this, in your Java-era language of choice :-
>
>//different queues for SOAP, REST
>Queue q=new Soap12RequestQueue();
>
>q.add(new StatePut(job1.uri,Job.LIFECYCLE,Job.SUSPENDED));
>//let the queue reorder stuff if it wants to
>q.add(new 
>StatePut(job2.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_OPTIMAL);
>q.add(new StatePut(job3.uri,Job.LIFECYCLE,Job.SUSPENDED),Queue.POSITION_LAST);
>
>q.setEventHandler(this);
>q.nonBlockingSubmit();
>
>No, there is no code behind this example, and I am avoiding any hints as 
>to what the even handler would look like. I think the key point is that 
>once you embrace remote operations as async actions, then you can model 
>the manipulations differently.  Note also that I am representing job 
>suspension not as an explicit suspend() operation, but as a request to put 
>a job into the suspended state. This API could work with our friend REST 
>just as easily as with WS-RF...
>
>Anyway Savas, to conclude: do you have any evidence that a single document 
>is suboptimal compared to a sequences of requests over an open HTTP/1.1 
>connection? That is, assuming we ignore the SHOULD in the HTTP1.1 
>specification " Clients SHOULD NOT pipeline requests using non-idempotent 
>methods or non-idempotent sequences of methods" [3]
>
>-Steve
>
>
>[1] WebDav http://www.ietf.org/rfc/rfc2518.txt S8.9.2
>
>"after encountering an error moving a non-collection
>    resource as part of an infinite depth move, the server SHOULD try to
>    finish as much of the original move operation as possible."
>
>[2] http://lists.w3.org/Archives/Public/w3c-dist-auth/1997JulSep/0177.html
>
>[3] RFC2616 HTTP1.1

_______________________________________________________________
Ian Foster                    www.mcs.anl.gov/~foster
Math & Computer Science Div.  Dept of Computer Science
Argonne National Laboratory   The University of Chicago
Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
Tel: 630 252 4619             Fax: 630 252 1997
         Globus Alliance, www.globus.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/ogsa-wg/attachments/20050405/68dfa326/attachment.html 


More information about the ogsa-wg mailing list