[DFDL-WG] How to determine the length of an element which has text representation
Alan Powell
alan_powell at uk.ibm.com
Tue Nov 17 05:48:00 CST 2009
A small correction:
the parsing rules I propose, and I think what is currently in the spec,
are
- for fixed length 'text' elements (lengthKind is 'implicit' or
'explicit') that also has terminating markup (terminator or in-scope
separator or terminator) then the parser should scan for the markup then
check the length
- for fixed length 'text' elements (lengthKind is 'implicit' or
'explicit') with no terminating markup the length is used
- for fixed length 'binary' fields, which are not scannable, with
terminating markup then the length should be used to extract the field
then scan for markup. (I'm not sure this is a realistic scenario but it is
allowed.)
- for fixed length 'binary' fields without terminating markup then the
length should be used
- for fixed length complex elements with terminating markup each child is
treated as above. When the end of the complex element is found it is
compared to the fixed length
- for fixed length complex elements without terminating markup the length
is used to extract the element and that 'buffer' is parsed for the
children.
- I was not suggesting that dfdl:length should be examined for any
lengthKind other than explicit
Notes:
Because lengthKind explicit is used to specify a fixed length or a
reference to a length field it isn't possible we have to treat them the
same way even. However if the found length doesn't match the 'fixed'
length it should be a processing error and cause backtracking but if the
reference length doesn't match it should be a hard error. Perhaps we need
a way to distinguish between these cases.
There needs to be similar rules for the other lengthKinds, eg prefixed,
with terminating markup.
I will put this on the agenda for this weeks call
Alan Powell
MP 211, IBM UK Labs, Hursley, Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM email: alan_powell at uk.ibm.com
Tel: +44 (0)1962 815073 Fax: +44 (0)1962 816898
From:
Tim Kimber/UK/IBM at IBMGB
To:
dfdl-wg at ogf.org
Date:
17/11/2009 01:42
Subject:
[DFDL-WG] How to determine the length of an element which has text
representation
The current version of the specification ( v0.36) does not clearly specify
how an element which has a specified length should be parsed.
- Section 14.3, when describing dfd:length says "Only used when lengthKind
is ?explicit? "
- The precedence rules say that when lengthKind="delimited", no other
properties are consulted
- Section 17.3.2 has a comment saying that it is incorrect. The comment
contains a couple of rather ambiguous statements about what the behaviour
should be.
Alan proposes that the behaviour should be as follows:
- When dfdlLength has a value, the length of the field must always conform
to that value.
- When there is terminating markup in scope ( terminators or separators )
the parser always uses them.
- If a text field has a defined dfdl:length AND there is terminating
markup in scope, then the parser should first scan to find the actual
length, then check the actual length against dfdl:length and raise a
processing error if they do not match.
I favour the following alternative rules
- dfdl:lengthKind always determines the method that the parser will use to
the find the length of the element
- if lengthKind='explicit' or 'implicit' or 'prefixed' then the length is
extracted without scanning.
- if lengthKind='delimited' then the length is extracted by scanning and
no check is performed against dfdl:length
The alternative rules have the following advantages:
- they provide a way of switching off scanning within the scope of a
delimited structure. The proposed rules do not.
- they are easier to implement ( parser doesn't have to keep track of
whether there is any terminating markup in scope - lengthKind always
provides the rule )
- they are slightly easier to explain to users for the same reason
They do have the following drawbacks:
- dfdl:length is completely ignored when lengthKind='delimited'. It is not
even used to validate the extracted length. Some users might not like
this.
- there are known scenarios ( e.g. SWIFT 52B ) where it is necessary to
check the length of a delimited field in order to choose the correct
branch of a choice. Checking dfdl:length would make it easy to do that.
re: the ignoring of dfdl:length, we *could* make a rule that the length is
checked after the delimited scan has been performed. But then it would be
necessary to ensure that dfdl:length was un-set for the far more usual
case where the length is not important.
I think the control of backtracking in the 52B scenario is an edge case.
In most cases where delimited fields have a known length we can safely
leave the length checking to the schema validator, or perhaps to a more
functional complex validation layer. For 52B, the user will have to create
a dfdl:assert to trigger the required processing error when the length is
incorect.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20091117/a8184fa5/attachment-0001.html
More information about the dfdl-wg
mailing list