[DFDL-WG] How to determine the length of an element which has text representation

Tue Nov 17 05:48:00 CST 2009

A small correction:

the parsing rules I propose, and I think what is currently in the spec, 
are

- for fixed length 'text' elements (lengthKind is 'implicit' or 
'explicit') that also has terminating markup (terminator or in-scope 
separator or terminator) then the parser should scan for the markup then 
check the length
- for fixed length 'text' elements (lengthKind is 'implicit' or 
'explicit') with no terminating markup the length is used

- for fixed length 'binary' fields, which are not scannable, with 
terminating markup then the length should be used to extract the field 
then scan for markup. (I'm not sure this is a realistic scenario but it is 
allowed.)
- for fixed length 'binary' fields without terminating markup  then the 
length should be used

- for fixed length complex elements with terminating markup each child is 
treated  as above. When the end of the complex element is found it is 
compared to the fixed length
- for fixed length complex elements without terminating markup the length 
is used to extract the element and that 'buffer' is parsed for the 
children.

 - I was not suggesting that dfdl:length should be examined for any 
lengthKind other than explicit

Notes:
Because lengthKind explicit is used to specify a fixed length or a 
reference to a length field it isn't possible we have to treat them the 
same way even. However if the found length doesn't match the 'fixed' 
length it should be a processing error and cause backtracking but if the 
reference length doesn't match it should be a hard error. Perhaps we need 
a way  to distinguish between these cases.

There needs to be similar rules for the other lengthKinds, eg prefixed, 
with terminating markup.

I will put this on the agenda for this weeks call

Alan Powell

 MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
 Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
 Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898

From:
Tim Kimber/UK/IBM at IBMGB
To:
dfdl-wg at ogf.org
Date:
17/11/2009 01:42
Subject:
[DFDL-WG] How to determine the length of an element which has text 
representation

The current version of the specification ( v0.36) does not clearly specify 
how an element which has a specified length should be parsed. 
- Section 14.3, when describing dfd:length says "Only used when lengthKind 
is ?explicit? " 
- The precedence rules say that when lengthKind="delimited", no other 
properties are consulted 
- Section 17.3.2 has a comment saying that it is incorrect. The comment 
contains a couple of rather ambiguous statements about what the behaviour 
should be. 

Alan proposes that the behaviour should be as follows: 
- When dfdlLength has a value, the length of the field must always conform 
to that value. 
- When there is terminating markup in scope ( terminators or separators ) 
the parser always uses them. 
- If a text field has a defined dfdl:length AND there is terminating 
markup in scope, then the parser should first scan to find the actual 
length, then check the actual length against dfdl:length and raise a 
processing error if they do not match. 

I favour the following alternative rules 
- dfdl:lengthKind always determines the method that the parser will use to 
the find the length of the element 
- if lengthKind='explicit' or 'implicit' or 'prefixed' then the length is 
extracted without scanning. 
- if lengthKind='delimited' then the length is extracted by scanning and 
no check is performed against dfdl:length 

The alternative rules have the following advantages: 
- they provide a way of switching off scanning within the scope of a 
delimited structure. The proposed rules do not. 
- they are easier to implement ( parser doesn't have to keep track of 
whether there is any terminating markup in scope - lengthKind always 
provides the rule ) 
- they are slightly easier to explain to users for the same reason 

They do have the following drawbacks: 
- dfdl:length is completely ignored when lengthKind='delimited'. It is not 
even used to validate the extracted length. Some users might not like 
this. 
- there are known scenarios ( e.g. SWIFT 52B ) where it is necessary to 
check the length of a delimited field in order to choose the correct 
branch of a choice. Checking dfdl:length would make it easy to do that. 

re: the ignoring of dfdl:length, we *could* make a rule that the length is 
checked after the delimited scan has been performed. But then it would be 
necessary to ensure that dfdl:length was un-set for the far more usual 
case where the length is not important. 
I think the control of backtracking in the 52B scenario is an edge case. 
In most cases where delimited fields have a known length we can safely 
leave the length checking to the schema validator, or perhaps to a more 
functional complex validation layer. For 52B, the user will have to create 
a dfdl:assert to trigger the required processing error when the length is 
incorect. 

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 246742

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20091117/a8184fa5/attachment-0001.html