[DFDL-WG] How to determine the length of an element which has text representation
Tim Kimber
KIMBERT at uk.ibm.com
Mon Nov 16 19:41:50 CST 2009
The current version of the specification ( v0.36) does not clearly specify
how an element which has a specified length should be parsed.
- Section 14.3, when describing dfd:length says "Only used when lengthKind
is ’explicit’ "
- The precedence rules say that when lengthKind="delimited", no other
properties are consulted
- Section 17.3.2 has a comment saying that it is incorrect. The comment
contains a couple of rather ambiguous statements about what the behaviour
should be.
Alan proposes that the behaviour should be as follows:
- When dfdlLength has a value, the length of the field must always conform
to that value.
- When there is terminating markup in scope ( terminators or separators )
the parser always uses them.
- If a text field has a defined dfdl:length AND there is terminating
markup in scope, then the parser should first scan to find the actual
length, then check the actual length against dfdl:length and raise a
processing error if they do not match.
I favour the following alternative rules
- dfdl:lengthKind always determines the method that the parser will use to
the find the length of the element
- if lengthKind='explicit' or 'implicit' or 'prefixed' then the length is
extracted without scanning.
- if lengthKind='delimited' then the length is extracted by scanning and
no check is performed against dfdl:length
The alternative rules have the following advantages:
- they provide a way of switching off scanning within the scope of a
delimited structure. The proposed rules do not.
- they are easier to implement ( parser doesn't have to keep track of
whether there is any terminating markup in scope - lengthKind always
provides the rule )
- they are slightly easier to explain to users for the same reason
They do have the following drawbacks:
- dfdl:length is completely ignored when lengthKind='delimited'. It is not
even used to validate the extracted length. Some users might not like
this.
- there are known scenarios ( e.g. SWIFT 52B ) where it is necessary to
check the length of a delimited field in order to choose the correct
branch of a choice. Checking dfdl:length would make it easy to do that.
re: the ignoring of dfdl:length, we *could* make a rule that the length is
checked after the delimited scan has been performed. But then it would be
necessary to ensure that dfdl:length was un-set for the far more usual
case where the length is not important.
I think the control of backtracking in the 52B scenario is an edge case.
In most cases where delimited fields have a known length we can safely
leave the length checking to the schema validator, or perhaps to a more
functional complex validation layer. For 52B, the user will have to create
a dfdl:assert to trigger the required processing error when the length is
incorect.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20091117/1870f493/attachment.html
More information about the dfdl-wg
mailing list