[DFDL-WG] Fw: Recursive use of DFDL for variable markup - action 028 - updated

Mon Jun 15 06:12:40 CDT 2009

The use cases for considering the inclusion of the recursive use of DFDL 
to define markup or other DFDL properties are:

a) Case insensitivity of data (eg, true & TRUE for text boolean)
b) Case insensitivity of markup (eg, hdr & HDR for initiator)
c) Different possible values for non-white space markup (eg, @ and # for 
separator) 
d) Different possible values for data (eg, true & yes for text boolean)
e) Encoding of markup different to encoding of data (eg, initiator and 
terminator different to data)

The proposal is to use various existing mechanisms to handle all these use 
cases, and negate the need to include recursive use of DFDL in 1.0.

a) Case insensitivity of data (eg, true & TRUE for text boolean)
- Use a single flag dfdl:ignoreCase to cover all affected properties
- Properties:
        - dfdl:occursStopValue
        - dfdl:numberZeroRep **
        - dfdl:nilValues
        - dfdl:textBooleanTrue
        - dfdl:textBooleanFalse
        - dfdl:numberInfinityRep         **
        - dfdl:numberNanRep     **
        - dfdl:numberExponentCharacter **

b) Case insensitivity of markup (eg, hdr & HDR for initiator)
- Use same flag dfdl:ignoreCase to cover all affected properties
- Properties:
        - dfdl:initiator
        - dfdl:terminator
        - dfdl:separator

c) Different possible values for non-white space markup (eg, @ and # for 
separator) 
- Use multi-value property. Propose that property name remains singular.
- Properties:
        - dfdl:initiator
        - dfdl:terminator
        - dfdl:separator

d) Different possible values for data (eg, true & yes for text boolean)
- Use multi-value property. Propose that property name remains singular, 
so dfdl:nilValues becomes dfdl:nilValue singular.
- Properties:
        - dfdl:occursStopValue
        - dfdl:numberZeroRep **
        - dfdl:nilValues
        - dfdl:textBooleanTrue
        - dfdl:textBooleanFalse

e) Encoding of markup different to encoding of data (eg, initiator and 
terminator different to data)
- Use <xs:sequence> to wrap the element and carry the markup, for example:
    <sequence dfdl:encoding="ascii" dfdl:separator=":">
        <sequence dfdl:encoding="ebcdic" dfdl:initiator="VAL" 
dfdl:terminator="END">
            <element name="val" type="..." dfdl:encoding="ascii" />
        </sequence>
    </sequence> 
- This should be able to handle all cases of what is a rare occurrence 
anyway, and still allows speculative parsing rules to apply.
- This technique also allows you to change the dfdl:ignoreCase property 
between markup and data.
- Alternative is to treat the markup as a value (the EDI scenario) - this 
is the subject of a separate action 026, which will be solved using 
variables or another technique, but not by using DFDL recursively.

There are some other properties to which cases a), b), c), d) could apply, 
but it has been decided that the flexibility is not needed in practice. 

        - dfdl:textPadCharacter
        - dfdl:escapeCharacter 
        - dfdl:escapeForEscapeCharacter 
        - dfdl:escapeBlockStart 
        - dfdl:escapeBlockEnd 
        - dfdl:numberGroupSeparator 
        - dfdl:numberDecimalSeparator 

** ICU assumes a single char for nan, infinity, and exponent. That's too 
restrictive for us, so propose using the DFDL nan, infinity and exponent 
properties like the zero rep property - they are used to pre-process the 
data for ICU on parsing, and applied to the ICU output on unparsing. 

For date/time support, the comparisons made by ICU when checking days, 
months, etc are case-insensitive, so DFDL does not need to provide any 
extra behaviour.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 09/06/2009 13:27 -----

Steve Hanson/UK/IBM at IBMGB 
Sent by: dfdl-wg-bounces at ogf.org
15/04/2009 13:47

To
dfdl-wg at ogf.org
cc

Subject
[DFDL-WG] Recursive use of DFDL for variable markup - use case

>From last week's call: 

7. Recursive use of DFDL for variable markup 
Use of a DFDL annotated element/type to describe an initiator, length 
prefix, terminator, separator, etc. Steve suggested the most important use 
of "variable markup-like mechanism" in IBM's WTX product is to reference a 
location earlier in the bit stream where a delimiter value is found. We 
handle this already by use of  a path expression. The additional variable 
markup mechanism was to avoid proliferation of keywords for various corner 
cases on initiator, terminator and separator. Eg., what if you want the 
initiator to be "Name" or "name" only, not "NAME", "nAmE", etc. So case 
insensitive is not expressive enough. This can always be modeled, just not 
as an initiator tag. Feeling was to leave out variable markup (other than 
for prefix lengths) for v1.0, and to propose the minimum set of extra 
properties that can be used to address the common use cases, but that IBM 
needed to see whether this satisfied all WTX use cases.   

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 --
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090615/a7c44b80/attachment.html