[DFDL-WG] Unparsing - Pad to "big enough length to fit..." - can it be achieved in DFDL?

Steve Hanson smh at uk.ibm.com
Fri Nov 16 11:03:29 EST 2012


Mike

A while back we took the following errata (in my under-construction v011 
doc), which effectively treats lengthKind 'explicit' with an expression as 
variable length data to be treated the same as 'prefixed'. In other words 
the expression would not be used. I think that prevents your scenario from 
working even with pre-processed lengths. 

2.100. Section 12.3.1. State that when unparsing an element with 
lengthKind ‘explicit’ and where length is an expression, then the data in 
the Infoset is treated as variable length and not fixed length. The 
behaviour is the same as lengthKind ‘prefixed’. 

At least, that was what I minuted,. I'll forward the email thread that 
discussed this as it has some examples.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org, 
Date:   14/11/2012 18:23
Subject:        [DFDL-WG] Unparsing - Pad to "big enough length to fit..." 
- can it be achieved in DFDL?
Sent by:        dfdl-wg-bounces at ogf.org




I have a format where fields are supposed to be padded so they all have 
the length of the longest one.

I.e., something like this with fields terminated by | chars

FIELD1|FIELD2  |ST|PH          |   
ABC   |CAPE COD|MA|888-999-0000|
DEFG  |BOSTON  |MA|-           |

I believe I can warn if all the participants in a column of this tabular 
thing don't have the same length. I would just set a variable with the 
length of the first row, then on subsequent rows (parsed with a different 
element in the schema) I would verify that the length equals what is 
stored in the variable by way of an assertion. I believe this is what 
recoverableError kind is for actually, when you have redundant information 
you want to warn about.

The problem I have is unparsing. How in DFDL I could determine how wide to 
pad each field before unparsing it. The spec requires me to measure the 
width of the fields (including that quasi-header row which is first, and 
measure how long each field is, then pad them to that maximum.

I believe there is no way to do this currently in DFDL, because there is 
no way that an expression can refer to more than one record, nor any way 
to iterate or operate over sets of data. 

Instead an application must be written which can use DFDL v1.0 but the 
application has to provide some of the smarts. 

In this case, the application would have to walk the infoset and determine 
the maximum length for each such field, then provide these length maximums 
to the DFDL schema by way of externally set variables. 

Is there any other way? Second issue: In addition, the spec of my format 
requires the total maximum length for all the fields of a row to be at 
most 72 characters long.  When unparsing, we don't evaluate assertions. So 
how can I test this?


--
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121116/01f5f68d/attachment-0001.html>


More information about the dfdl-wg mailing list