[dfdl-wg] simple way to study hard DFDL example problem - IBM Format VS rec ords as XML

Mon Nov 22 08:51:45 CST 2004

I agree that an extensible set of black-box stream-decoders (and
complementary stream encoders) that handle: zip, encryption, VB, VBS, VS,
and the other 19 or so complicated legacy formats, and so forth, is a good
and completely acceptable solution to this problem.

I think DFDL should be helpful to the person who has to write such a
stream-decoder/encoder for describing the physical stream format. 

This is a pragmatic decision, and for something like zip/unzip I think there
is little one could do which would be more efficient than this. The rub with
VS format is that most of the data is sitting there in what is very very
close to the correct logical data layout. This makes it feel like copying it
all to remove that little bit of excess physical structure feels
unnecessary, but I agree is probably the right thing to do given the
complexity involved in trying to avoid copying it.

...mikeb

> -----Original Message-----
> From: Steve Hanson [mailto:smh at uk.ibm.com] 
> Sent: Monday, November 22, 2004 5:58 AM
> To: dfdl-wg at gridforum.org
> Subject: RE: [dfdl-wg] simple way to study hard DFDL example 
> problem - IBMFormat VS rec ords as XML
> 
> 
> 
> 
> 
> I wrote my previous mail fairly quickly just before I left on 
> Friday to get something on the table. I've been thinking 
> about this problem over the weekend and have some more 
> thoughts which might help me get across where I am coming from.
> 
> The way I view physical rep information is as functions that 
> can be applied to types and fields. Writing the data out to a 
> blocked/segmented format does not fall into this category. It 
> is an orthogonal operation that applies to the whole data and 
> as such is much more akin to encryption and compression. For 
> example, I have a COBOL structure that ends up in an MQSeries 
> queue and in a QSAM file. It has a logical structure, it has 
> a physical representation. In the QSAM case a further 
> transform has taken place to block/segment the structure. I 
> would not expect to see the physical rep properties of the 
> types and elements change.
> 
> Mike's idea of a schema level 'stream' rep property sounds ok 
> in principle for parsing, but what other metadata is needed 
> when serialising? How are we informed of the rules for VB 
> blocking or for IMS segmentation? Are they fixed or 
> user-defined? If these rules end up requiring extra metadata 
> at the type/element level then I am not comfortable with 
> this, because we are mixing two sets of physical information.
> 
> I think that whatever principles we apply to DFDL 
> including/excluding encryption and compression we should also 
> apply to these formats.  What is the current proposal in this 
> area? The cheapest option would be to provide a flexible 
> user-defined transform capability.
> 
> We can discuss more on this week's call, but it sounds like 
> this is another of the high-level design issues to be 
> included in the F2F agenda.
> 
> Finally a correction. When I said that the broker does not 
> support these 19 or whatever formats, I should have been more 
> specific and said that the broker's message model does not 
> support these. That is, we do not provide physical rep 
> annotation support  for such formats, for the reason stated 
> above. The expectation is that is that the 
> decryption/decompression/deblocking has all taken place as a 
> separate transformation elsewhere in the broker.
> 
> Regards, Steve
> 
> Steve Hanson
> WebSphere Business Integration Brokers,
> IBM Hursley, England
> Internet: smh at uk.ibm.com
> Phone (+44)/(0) 1962-815848
> 
> 
>                                                               
>              
>              "Myers, James D"                                 
>              
>              <jim.myers at pnl.go                                
>              
>              v>                                               
>           To 
>              Sent by:                  dfdl-wg at gridforum.org  
>              
>              owner-dfdl-wg at ggf                                
>           cc 
>              .org                                             
>              
>                                                               
>      Subject 
>                                        RE: [dfdl-wg] simple 
> way to study   
>              19/11/2004 17:04          hard DFDL example 
> problem -         
>                                        IBMFormat VS rec      
> ords as XML   
>                                                               
>              
>                                                               
>              
>                                                               
>              
>                                                               
>              
>                                                               
>              
>                                                               
>              
> 
> 
> 
> 
> I think we at least agree in practice that there's a limit on 
> how complex a transform you'd want to code in DFDL logic. Not 
> sure if we agree on whether it is possible.
> 
> As for LR parsers - I'm not a parser guy, but I just looked 
> at the wikipedia entry :-) :
> 
> Seems like a simple enough concept - if you let me have 
> layers, and I can use information in those layers to select 
> choices for further processing, can you stop me from making 
> an LR parser (or doing what an LR parser does)? I've got a 
> stack, and choices let me specify an action table... In the 
> same way that if you give me layers (or variables), addition, 
> and for loops, you can't stop me from doing multiplication.
> And if you require those  things for other reasons but don't 
> need multiplication, you can't really talk about excluding 
> multiplication from the language design. You can say that we 
> won't worry about multiplcation examples or how easy it is to 
> write them down or what performance you'll get trying to run 
> them and suggest that you plug something in to handle them 
> directly though, and this is probably what we need to do in DFDL.
> 
> I may still be missing something and there is a piece of 
> functionality that we haven't identified a need for that 
> would be needed for an LR parser/our pathological examples, 
> but I guess I'm getting more convinced that our primitives 
> are sufficiently powerful that they can be used/abused to do 
> all of the complex things that have come up. I'm not sure how 
> we can close the issue - specify the map from DFDL primitives 
> to LR parser as I started to above, or find an example known 
> to require LR parsing and work it? Or?
> 
>   Jim
> 
> 
> 
> > -----Original Message-----
> > From: owner-dfdl-wg at ggf.org [mailto:owner-dfdl-wg at ggf.org] 
> On Behalf 
> > Of mike.beckerle at ascentialsoftware.com
> > Sent: Friday, November 19, 2004 11:36 AM
> > To: smh at uk.ibm.com; dfdl-wg at gridforum.org
> > Subject: RE: [dfdl-wg] simple way to study hard DFDL 
> example problem - 
> > IBMFormat VS rec ords as XML
> >
> >
> > I believe you and Jim are actually disagreeing. Jim is saying he's 
> > still optimistic that this transformation, even though 
> complex, can be 
> > expressed directly in DFDL. You are saying this would 
> require XSLT or 
> > a Java program or whatever to do it.
> >
> > >
> > > Mike you say you are aware of 19 such legacy formats, and I bet 
> > > there are more. Well IBM's broker has no specific support 
> for any of 
> > > these, nor have we been asked to incorporate them into 
> our message 
> > > model. Maybe we should play the percentages game - if we 
> see enough 
> > > different subsystems that use the same cryptic format then it 
> > > becomes worth building the support into DFDL.
> > >
> >
> > Ascential supports 6 or 7 of these formats today. Batch 
> systems will 
> > encounter this more than online. You get them when a mainframe job 
> > writes out a tape on a mainframe, and then you read that tape on a 
> > unix tape drive either directly or first into a file. 
> Alternatively, 
> > you pick up a mainframe file via FTP or some such and 
> directly operate 
> > on it on other systems.
> > Mainframe software handles all the VS block and and such 
> stuff in the 
> > lower layers as you know (not to mention the tape label) 
> unix software 
> > does none of this, you just get the raw bytes.
> >
> > My point is not as much about these 19 or more particular 
> formats, but 
> > the issue of how much complexity we go after.
> >
> > In the past we've looked at things like logical arrays with 
> > run-length-encoded representations and the suggestion has 
> been there 
> > that DFDL might be able to directly express this transformation 
> > without need to go outside DFDL.
> >
> > I've come to believe there are certain limits to this 
> complexity and I 
> > think perhaps tree-shape compatibility is at the core of them.
> > Building a DFDL
> > description for data that ultimately requires an LR(k) 
> sophistication 
> > parser to correctly interpret the data is clearly a non-starter it 
> > seems. Where this line is drawn is important.
> >
> > ...mikeb
> >
> >
> >
> >
> > ...mikeb
> >
> >
> 
> 
>