[DFDL-WG] Fw: Omitted array occurrences

Alan Powell alan_powell at uk.ibm.com
Thu Nov 26 10:56:48 CST 2009


Steve

2)

"implies the children of the sequence must have dfdl:initiator specified. 
" is wrong. 
It should be "It must be possible for speculative parsing to identify 
which elements are present." as in the table in the next section.

I have changed it

Alan Powell

 MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
 Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
 Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898




From:
Steve Hanson/UK/IBM at IBMGB
To:
dfdl-wg at ogf.org
Date:
25/11/2009 16:40
Subject:
[DFDL-WG] Fw:  Omitted array occurrences




As discussed on the call, here is some more in-line below in green and 
tagged SMH+. 

A couple of other things when writing this up. 

1) I think separatorPolicy="required" is misleading and I'm sure 
contributed to Tim's questions about behaviour.  Here we are using 
"required" to mean that all delimiters are needed, even when the data 
itself is not required.  I think we should use "always". 

2) I'd forgotten that there is also separatorPolicy="suppress". In this 
case, any missing element does not get a separator. The spec states this "
implies the children of the sequence must have dfdl:initiator specified. " 
but it does not say whether the omission of an initiator is a schema 
definition error. Should it be? 

Regards

Steve Hanson
Programming Model Architect, WebSphere Message  Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh at uk.ibm.com,
Phone (+44)/(0) 1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 25/11/2009 16:11 ----- 
From: 
Steve Hanson/UK/IBM 
To: 
dfdl-wg at ogf.org 
Date: 
25/11/2009 12:37 
Subject: 
Re: [DFDL-WG] Omitted array occurrences



Tim, Alan - my thoughts on this in blue (SMH). 

Regards

Steve Hanson
Programming Model Architect, WebSphere Message  Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh at uk.ibm.com,
Phone (+44)/(0) 1962-815848 


From: 
Alan Powell/UK/IBM at IBMGB 
To: 
Tim Kimber/UK/IBM at IBMGB 
Cc: 
dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org 
Date: 
19/11/2009 17:30 
Subject: 
Re: [DFDL-WG] Omitted array occurrences 
Sent by: 
dfdl-wg-bounces at ogf.org





Tim 

Comments below 
Need more discussion on this 

Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898


From: 
Tim Kimber/UK/IBM at IBMGB 
To: 
dfdl-wg at ogf.org 
Date: 
19/11/2009 12:07 
Subject: 
[DFDL-WG] Omitted array occurrences






What should the DFDL unparser do when some or all of the elements of an 
array are missing? 

I have found the following statements in v0.36 which seem relevant: 

Section 5.2.1 
The minOccurs value is used: 
·        to determine if an element declaration or reference is scalar or 
array 
·        to determine the required minimum number of occurrences of an 
array both when parsing and unparsing 

Section 16.13 
( Note : this definition of 'required' is a repeat of the defintion in 
section 3 ) 
Definition: 'required' 
We define the term 'required' as follows: 
·        A scalar element is required. 
·        An element of a fixed-occurrence array is required. 
·        An element of a variable-occurrence array is required if its 
index is less than or equal to the value of minOccurs. 
All other elements are not required. 
... 
On unparsing, if an element is required, and is not part of the logical 
data and the element has a default value specified then it is used, 
otherwise it is a processing error. 

Section 17.3.1 : Sequence groups and separators 

re: the combination of separatorPolicy="suppressAtEnd" and 
sequenceKind="ordered": 
All separators must be found in the data except that when the sequence has 
trailing optional items, the separators are suppressed for any final 
missing items. Note suppressAtEnd can only be used when there is no clash 
with delimiters from the containing structure. 

My interpretation of the specification is: 
a) if separatorPolicy="require" then the unparser should output a 
separator for all missing required elements ( whether array members or not 
) 
Is this an additional definition of a 'required' element? In which case 
the default value should be output. (interestingly because default is a 
schema property rather than a dfdl property you cannot set a default 
default.) 
SMH: The definition of 'required' relates to the data. Here we are talking 
about whether to output syntax. Strike 'required' from Tim's 
interpretation and you have the correct interpretation. 
b) if separatorPolicy="suppressAtEnd" then the unparser should output a 
separator for all non-trailing missing required elements 
Should set the default for any required element so it won't be missing.
"On unparsing, if an element is required, and is not part of the logical 
data and the element has a default value specified then it is used, 
otherwise it is a processing error. " 
SMH: Tim's interpretation is not complete.  The correct interpretation is 
"...then the unparser should output a separator for all missing elements 
in the sequence up to and including the last required element.".    It is 
only optional elements beyond the last required element that benefit from 
this property. 
c) separators for missing elements must be output regardless of whether 
the element is required/optional, simple/complex, does/does not have a 
default value etc. I assume this because the term 'missing' is used rather 
than the very clearly-defined term 'required'. 
Missing just means not in the infoset and is orthogonal to 
optional/required. If you accept this is an additional definition of 
required then no. But it then forces you to set defaults for minOccurs=0 
elements which will only be used in this circumstance. I'm not sure what 
the default for complex elements would be: all the children must have a 
default? .
SMH: If c) is trying to say that once you have decided, via a) and b), 
that a separator is needed, then whether it is simple/complex, does/does 
not have a default, is irrelevant, then I agree. 
Reading between the lines, I also infer the following rules: 
d) if an array has maxOccurs="unbounded" and it is missing from the 
infoset then the unparser will not output any separators for the array 
e) if an array has maxOccurs!="unbounded" and it is missing from the 
infoset then  the unparser will output a separator for each missing 
occurrence ( so it will output maxOccurs separators ). 
If minOccurs > 0  then use default. If minOccurs= 0 then output nothing. I 
don't think maxOccurs has any effect. 
SMH+: The behaviour when dealing with a repeating element (minOccurs, 
maxOccurs) is analogous to dealing with a sequence.  You treat up to and 
including minOccurs as 'required', and anything beyond as 'optional'. Then 
you apply separatorPolicy property.  So "suppressAtEnd" means you only 
output delimiters up to an including minOccurs, and "required" means you 
output delimiters up to and including maxOccurs.  There's clearly a 
problem with the combination of maxOccurs="unbounded" and 
separatorPolicy="required" - this should be a schema definition error. 
SMH+: It is possible that some models are pretty ambiguous, and that we 
could be outputting something that is very difficult to parse. If it is 
possible to use the full DFDL armoury of parsing techniques (speculation, 
backtracking, data patterns, remodelling as choice and discriminators, 
etc) then that is a 1.0 limitation. 
f) if an element contains a child group, and none of the group members are 
present in the infoset, then the group is 'missing' and the unparser will 
output a separator for it. 
Not sure
SMH: This is establishing 'missing' for a local group. Sounds right to me. 
The separator will be output according to a) and b). But because a local 
group is (1:1) in DFDL, in practice you will always get a separator. 
SMH+: If a local group needs to be optional it must be wrapped in a 
complex element. 
Suggested changes to the specification: 
- As a minimum, I think it would be useful for the specification to 
include a definition of 'missing'.  'Not in the infoset'  SMH: That's fine 
for unparsing only.
- DFDL does not allow min/maxOccurs on groups, so they implicitly have 
cardinality 1:1. Specification should specify the behaviour of the 
unparser when none of a group's members are present in the infoset. Agree. 

- The wording in 17.3.1 could be more accurate. I don't think the word 
'optional' should be there ( if validation is off then the unparser will 
tolerate missing required elements -No. 'required' is not part of 
vaildation). I think the words 'trailing' and 'final' are intended to mean 
the same - we should standardize on 'trailing'. SMH: I agree the words 
could be improved. See my b) words above for example. 
regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 246742




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg 





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 http://www.ogf.org/mailman/listinfo/dfdl-wg 






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 












Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20091126/42843e0f/attachment-0001.html 


More information about the dfdl-wg mailing list