[DFDL-WG] Action 261

Tim Kimber KIMBERT at uk.ibm.com
Wed Jun 11 08:57:38 EDT 2014


comments in <tk>tags
regards,

Tim Kimber, 
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742




From:   Steve Hanson/UK/IBM
To:     Tim Kimber/UK/IBM at IBMGB, 
Cc:     dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
Date:   11/06/2014 10:47
Subject:        Re: [DFDL-WG] Action 261


Some thoughts on this...

I agree that the definition of positional sequence in the spec needs 
tightening as it is ambiguous as it stands and could be interpreted as a) 
or b).  If we adopted b) then that would appear to allow 'expression' to 
appear in a positional sequence, but wouldn't it also allow 'stopValue'? 
<tk>Yes - according to definition b) stopValue would be allowable in a 
positional sequence. We could still disallow it if we do not believe there 
is any benefit in allowing it. I don't believe it introduces any 
particular complexities for an implementer.</tk>

occursCountKind 'expression' is analogous to lengthKind 'explicit' with an 
expression and to lengthKind 'prefixed'. Both these lengthKinds are 
classified as 'specified length' when parsing but 'variable length' when 
unparsing. We are observing that occursCountKind 'expression' is like 
'fixed' when parsing but not quite so like 'fixed' when unparsing - which 
is why section 16 groups 'expression' with 'parsed' for unparsing.
<tk>Yes - we took a decision that the unparser should ignore the 
expression in lengthKind/occursCountKind, and just output whatever data 
happens to be in the info set.
I'm not sure that it saves a lot of effort in the implementation and it 
certainly is not easy to justify as a consistent behaviour. For me, the 
unparser should treat lengthKind='explicit' the same way whether the value 
is static or calculated. And the unparser should treat 
lengthKind='expression' the same way as lengthKind='fixed'.
</tk>

When unparsing occursCountKind 'expression' you don't always have the 
calculated array length N. If the infoset was derived from XML, there is 
likely no 'count' element, just a bunch of elements with the same name 
that make up the 'array'. DFDL gives you the choice whether to manually 
set the count element, or to have the parser set it automatically via 
outputValueCalc. In the former case, you can create a document that can 
not be parsed;
<tk>You can with the current rules too. In fact, you can parse a document 
with trailing optional empty array occurrences and when it is unparsed the 
trailing empty occurrences will have been discarded.</tk>

the unparser could check the 'count' element matches the infoset, but that 
would involve reverse engineering an arbitrarily complex expression and is 
why the specification does not say that. 
<tk>It would involve evaluating the expression. In most cases, that will 
not require any lookahead because the Length/Count field will precede the 
array or element. Not sure where the reverse engineering comes in?</tk>

Here's a real example of such an expression (albeit with lengthKind 
'explicit' but the principle is the same):

        dfdl:length="{xs:nonNegativeInteger(fn:floor((../Length + 1) div 
2))}"

Alex brought up the case where the expression evaluates to 0. In a 
positional sequence, would you still expect a delimiter for this case? 
<tk>Yes, unless it is in the trailing optional region of the group and 
SSP='trailingEmpty'. In a positional sequence, every delimiter must be 
present until suppression begins ( if allowed )</tk>

If 'yes' then the resultant zero length string must be treated as the 
'absent representation' and ignored. If 'no' then is the sequence still 
positional?
<tk>I don't understand the point. Why would it not be the 'empty 
representation'? Why must it be 'ignored' if it does happen to be the 
'absent representation'? What does 'ignored' mean?</tk>

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848




From:   Tim Kimber/UK/IBM at IBMGB
To:     dfdl-wg at ogf.org, 
Date:   10/06/2014 21:22
Subject:        [DFDL-WG] Action 261
Sent by:        dfdl-wg-bounces at ogf.org



 Implied separatorSuppressionPolicy for occursCountKind 'expression ' 
(All) 
10/6: Spec says it is 'never' (positional sequence) but you have to parse 
to identify the position, so isn't that non-positional? 

I think there are two alternative definitions of 'positional': 
a) the identity of every delimited field is known before parsing of the 
sequence group begins 
b) the identity of every delimited field is known before parsing of the 
field begins 

As an implementer, b) is sufficient because it means that the parser never 
needs to backtrack while parsing the group. 
a) allows the field identities to be statically known, but that is less 
important - it does not allow optimised extraction of a particular field 
as would be the case for a fixed-length group ( the possibility of escaped 
separators/terminators means that every character will need to be scanned 
anyway ). 

It may sound like a small point, but it affects two decisions 
1. whether ock='expression' should be allowed within a positional sequence 
group ( action 261 ) 
2. what the behaviour of the unparser should be w.r.t. ock='expression'. 

My own feeling is that ock='expression' should be treated almost exactly 
like ock='fixed', except that the calculated array length N is used 
instead of maxOccurs. 
- When parsing a positional sequence group it should cause N delimiters to 
be expected for the array. 
- When unparsing a positional sequence group it should cause N delimiters 
to be written. 
These rules are consistent and straightforward to describe and implement. 
The current rule ( unparser outputs the occurrences that are in the info 
set only ) allows the unparser to write a document that cannot be parsed 
using the same schema. 

regards,

Tim Kimber, 

----- Forwarded by Tim Kimber/UK/IBM on 10/06/2014 20:34 ----- 

From:        Steve Hanson/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org, 
Date:        10/06/2014 17:57 
Subject:        [DFDL-WG] OGF DFDL WG Call Minutes 2014-06-10 
Sent by:        dfdl-wg-bounces at ogf.org 



Please find minutes from the above call at 
http://redmine.ogf.org/dmsf_files/13263?download= 

Regards

Steve Hanson
Architect, IBM DFDL,
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 --
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140611/36b279a3/attachment.html>


More information about the dfdl-wg mailing list