[ogsa-d-wg] Draft Charter for Data Movement Interface Standardization WG

William E. Allcock allcock at mcs.anl.gov
Tue Sep 20 17:31:47 CDT 2005


Sorry, yes, on Monday :-)

Ravi Madduri wrote:
> Bill,
> it is 2:00 pm on monday right ?
> 
> On Sep 20, 2005, at 3:21 PM, William E. Allcock wrote:
> 
>> The BOF has been approved:
>>
>> 2:00 pm - 3:30 pm
>>
>> Data Movement Interface Standardization
>> (Data Movement Interface Standardization)  Charter-Discussion BOF
>>
>> Many services have the need to move data (as opposed to messages
>> invoking the service). The characteristics / semantics required can  vary
>> greatly. There are several existing interfaces (RFT, FTS, , all file
>> based, that are similar, but incompatible. A working group spawned by
>> this BOF would work towards a standardized set of WSDL that could  invoke
>> a service that met the requirements of existing services as well as
>> non-File based sources.
>>
>> Location: Imperial Ballroom
>>
>> See you there!
>>
>> Bill
>>
>> Malcolm Atkinson wrote:
>>
>>> Dave this is a good way of posing the question.
>>> One can also relate it to the proposal a couple of years ago by Peter
>>> Kunszt for grid data handles as a generic concept for naming data.
>>> I think the problem arises when the selected data is an arbitrary  
>>> subset
>>> of the stored data or a derivative of the stored data derived via  
>>> any of
>>> a wide range of languages: Xquery, Xpath, SQL, LDAP, semi-structured
>>> QLs, statistical languages, FFT, datacutter, ....
>>> These all make sense.  It is just hard to understand how to compose
>>> them.  It is hard to make general rules.  The derived data has to be
>>> evaluated to some point where it can be moved - a RAM buffer - can  the
>>> movement be part of the standard and the derivation processes be
>>> strictly corraled in some other specs?
>>> Malcolm
>>>  >-----Original Message-----
>>>  >From: Dave Berry  >Sent: 14 September 2005 21:00
>>>  >To: Malcolm Atkinson; William E. Allcock; ogsa-d-wg at ggf.org;   
>>> >gsm-wg at ggf.org; byte-io-wg at ggf.org; Peter Kunszt; James  >Casey;  
>>> Ravi Madduri
>>>  >Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement   
>>> >Interface Standardization WG
>>>  >
>>>  >Hi Malcolm,
>>>  >
>>>  >I'd rather ask, what are the characteristics of a file that   
>>> >makes these file transfer mechanisms tractable?  Then we can   >ask, 
>>> to what extent can we generalise the mechanism?   >
>>>  >For example, if the key characteristics are that a file can  >be  
>>> named and supports random access, then we might generalise  >the  
>>> mechanism to include data in RAM (which would avoid  >unnecessary  
>>> copying to disk).  This case would be analogous  >to some  operating 
>>> systems which allow entities in RAM to be  >addressed as  part of the 
>>> file system.
>>>  >
>>>  >Conversely, if the mechanism can handle any named sequence of   
>>> >bytes, then it could presumably handle streaming data as  >well.   
>>> Or if it requires other operations that are specific  >to the  
>>> location of bytes on a disk (or tape), then the WG  >will restrict  
>>> its attention to those cases.
>>>  >
>>>  >I would expect this group to place a strong requirement on  >the  
>>> OGSA WG to provide a naming system that can specify  >whatever  data 
>>> sets this WG wants to move.
>>>  >
>>>  >Dave.
>>>  >
>>>  >
>>>  >-----Original Message-----
>>>  >From: owner-ogsa-d-wg at ggf.org  >[mailto:owner-ogsa-d-wg at ggf.org]  
>>> On Behalf Of Malcolm Atkinson
>>>  >Sent: 14 September 2005 17:20
>>>  >To: William E. Allcock; ogsa-d-wg at ggf.org; gsm-wg at ggf.org;   
>>> >byte-io-wg at ggf.org; Peter Kunszt; James Casey; Ravi Madduri
>>>  >Subject: RE: [ogsa-d-wg] Draft Charter for Data Movement   
>>> >Interface Standardization WG
>>>  >
>>>  >
>>>  >Hi Bill
>>>  >
>>>  >I agree that such a standard interface is needed.
>>>  >When you look at files I presume you consider files where  >ever  
>>> they are, secondary or tertiary storage at least.
>>>  >
>>>  >When you say any dataa, then there is the possibility of   >trivial 
>>> or large amounts of data between RAM, as well as data   >from files 
>>> and databases with an enormous set of  >possibilities  of the way it 
>>> may be selected and identified.   >Eventually it  gets close to 
>>> Byte-IO, Streams
>>>  >(BoF) and InfoD etc.
>>>  >
>>>  >So I'm agreeing with you that if yu go beyond files then  >scope  
>>> control is difficult.
>>>  >
>>>  >Would it be better to do the standardisation of file movement   
>>> >first and look at other forms of adta movement later?
>>>  >
>>>  >Malcolm
>>>  >  >
>>>  > >-----Original Message-----
>>>  > >From: owner-ogsa-d-wg at ggf.org  > >[mailto:owner-ogsa-d- 
>>> wg at ggf.org] On Behalf Of William E. Allcock
>>>  > >Sent: 14 September 2005 17:04
>>>  > >To: ogsa-d-wg at ggf.org; gsm-wg at ggf.org; byte-io-wg at ggf.org;  >  
>>> >Peter Kunszt; James Casey; Ravi Madduri
>>>  > >Subject: [ogsa-d-wg] Draft Charter for Data Movement  >  
>>> >Interface Standardization WG
>>>  > >
>>>  > >Sorry for the re-send, I typoed the byte-io mail list.
>>>  > >
>>>  > >All,
>>>  > >
>>>  > >Sorry for the SPAM, but I sent this to the "likely  >suspects"  
>>> who might  >be interested.  I have a proposed BOF  >(waiting for  AD 
>>> approval) to  >discuss standardizing an  >interface for  invoking 
>>> data movement.  There  >are several of  >them out there  already.  
>>> CERN has the File Transfer  >System  >(FTS), the gsm-wg  has SRM 
>>> copy, Globus has the Reliable File   >>Transfer (RFT)  service, 
>>> etc..  I don't think there will be  > >any argument
>>>  > >that there is a need for such standardization, the hard  >part  
>>> will be  >scoping the extent of what we will work on.   >For  
>>> instance, all the  >examples above are file based, but  >ideally,  
>>> this interface would work  >for any data that can be  >addressed.   
>>> >  >I expect that that the BOF will be centered  >around scoping  the 
>>> working  >group, but I think we should  >(and approval of the  BOF 
>>> depends on)  >getting some initial  >discussion around the  scope.  
>>> So... here it goes:  >  >I  >think the obvious thing is  that it 
>>> needs to be able to have  > >the basic
>>>  > >functionality presented by FTS, RFT, and SRM-copy, however  >  
>>> >the devil is
>>>  > >in the details, so I will break this up into "blocks of  >  
>>> >functionality":
>>>  > >
>>>  > >Lets start with naming.  What will this service accept as   
>>> >valid names  >for entities that it will move?  URLs? EPRs?  >Will  
>>> logical  > >file names be
>>>  > >accepted or should they be translated outside this service?
>>>  > >
>>>  > >Related to the naming is what type of data will this  >service  
>>> move?  >Files? video streams?  the output of  >simulations? the  
>>> output  > >of database
>>>  > >queries?  Can we make this a service that any service that  >  
>>> >wants to move
>>>  > >data can simply invoke it?  Note that I am differentiating   
>>> >data from  >messages.  You would not use this to send the   >result 
>>> from a  > >service that
>>>  > >summed a bunch of numbers, that would simply be a SOAP  >  
>>> >response... IMHO :-).
>>>  > >
>>>  > >Can we make a generic module that would allow this   
>>> >functionality to be  >applied to any service that exposes the   
>>> >byte-io interface?  Does that  >affect the interface or is it   
>>> >just an implementation issue?  >  >Can we make this service   
>>> >transport mechanism agnostic?  both  > >application
>>>  > >transport (GridFTP vs HTTP vs ...) as well as network  >  
>>> >transport (TCP vs
>>>  > >UDP vs UDT vs ...).  My concern here is that I am not sure  >  
>>> >SOAP has the
>>>  > >functionality we need.  To do this, I wonder if we need the   
>>> >equivalent  >of a union in C, so that the parameters  >specified  
>>> are based on the
>>>  > >transport(s) chosen.  For instance, if you use TCP you need  >  
>>> >to specify a
>>>  > >buffer size, but not for UDP.  GridFTP specifies streams  >and  
>>> data  >channel authentication, but HTTP does not.  >   >>What  about 
>>> security / authorization.  This is a broad  >category and  we  
>>> >should push as much as possible outside of  >scope via  callouts  > 
>>> >and Policy
>>>  > >Enforcement Points (PEPs), but what about delivery guarantees   > 
>>> >such as AT
>>>  > >MOST ONCE, AT LEAST ONCE, EXACTLY ONCE, non-repudiation,   
>>> >etc.?  I know  >Dieter has a set of use cases that require  >some  
>>> of this type delivery  >guarantee functionality.  >  >A   
>>> >potentially contentious issue is whether or not these  >services  
>>> will  >use WSRF and notifications to expose (push  >from the  
>>> service)  > >or methods
>>>  > >to query the state (pull from the service).  Hopefully, we   >can 
>>> find a  >way to make each optional.  >  >If we start  >making  many 
>>> optional parts to the interface, it will make   >>what is  exposed as 
>>> service metadata for brokering will  >become more   >important.  I 
>>> would propose that we should make  >at a minimum a   >recommendation 
>>> for what facts about the  >service should be  exposed.  >  >All of 
>>> the existing services  >accept "bulk" inputs,  i.e., move  > >these 100
>>>  > >files.  This can be a problem when the requests become very  >  
>>> >large due to
>>>  > >de-serialization.  Should we provide a "chunking" interface   >so 
>>> that  >requests can be of unlimited size?  >  >Please feel   >free to 
>>> make comments on the above and more importantly    >>suggest other 
>>> important issues we need to address.  >  >btw,   >once we have a mail 
>>> list of our own we will quit  > >spamming the  other
>>>  > >lists :-).
>>>  > >
>>>  > >Bill
>>>  > >--  > >William E. Allcock
>>>  > >Argonne National Laboratory
>>>  > >Bldg 221, Office C-115A
>>>  > >9700 South Cass Ave
>>>  > >Argonne, IL 60439-4844
>>>  > >Office Phone:  +1-630-252-7573
>>>  > >Office Fax:      +1-630-252-1997
>>>  > >Cell Phone:      +1-630-854-2842
>>>  > >
>>>  > >
>>>  > >
>>>  > >
>>>  >
>>>  >
>>>
>>
>> -- 
>> William E. Allcock
>> Argonne National Laboratory
>> Bldg 221, Office C-115A
>> 9700 South Cass Ave
>> Argonne, IL 60439-4844
>> Office Phone:  +1-630-252-7573
>> Office Fax:      +1-630-252-1997
>> Cell Phone:      +1-630-854-2842
>>
>>
> 
> -- 
> Ravi K Madduri
> The Globus Alliance | Argonne National Laboratory
> http://www-unix.mcs.anl.gov/~madduri
> 
> 
> 

-- 
William E. Allcock
Argonne National Laboratory
Bldg 221, Office C-115A
9700 South Cass Ave
Argonne, IL 60439-4844
Office Phone:  +1-630-252-7573
Office Fax:      +1-630-252-1997
Cell Phone:      +1-630-854-2842





More information about the ogsa-d-wg mailing list