[DFDL-WG] How to determine the length of an element which has text representation

Tim Kimber KIMBERT at uk.ibm.com
Wed Nov 18 11:05:20 CST 2009


I'd like to record what was discussed and raise another point which Alan 
pointed out after meeting,

Discussions in the meeting
- dfdl:lengthKind applies only to the element on which it is specified. It 
has no effect whatever on the parsing of child elements/groups.
- there may be some value in tolerating simple elements of type xs:string 
with dfdl:representation="binary". Might be useful for schemas where 
dfdl:representation="binary" throughout. 
- Currently, the position of the WG is that parsers should *always* scan 
to extract the text representation if there is any terminating markup in 
scope. Even if lengthKind='explicit'. 
- TK proposed the scheme outlined in his previous email, in which 
dfdl:lengthKind alone specifies how the parser should extract the text 
representation. 
If lengthKind="explicit", scanning is switched off and dfdl:length is 
used. If lengthKind="delimited" the text rep is extracted by scanning and 
length is ignored.
- A refinement was discussed whereby dfdl:length would be checked after a 
scan has been performed if dfdl:lengthKind="delimited". This would make 
the modeling of some common formats simpler, and avoid the need for a 
dfdl:assert to enforce the length constraint.
- MB raised the possibility that we could actually disallow dfdl:length if 
lengthKind='delimited'. This is the most conservative position, but 
general opinion was that it would be too restrictive. There still might be 
some value in disallowing dfdl:length for other lengthKinds.

Discussions after the meeting
- Alan pointed out that lengthKind="explicit" does not necessarily mean 
that the length of the field is fixed. dfdl:length might be specified as a 
DFDL expression. A common reason for doing that would be to obtain the 
element's length from an earlier integer field. As currently specified, if 
there was any markup in scope, the text rep would be extracted by 
scanning. 

Restatement of my position after today's meeting:
I'm now even more convinced that dfdl:lengthKind="explicit" should switch 
off scanning. Here's why:
a) The enumerations of lengthKind are explicit, implicit, prefixed, 
delimited,  pattern, endOfParent. The presence of 'delimited' in that list 
means that in some users' minds, the other enumerations are going to be 
interpreted as *alternatives* to 'delimited'. 
b) If there's markup in scope, scanning cannot be switched off by any 
means. Not even by setting lengthKind='explicit' AND obtaining dfdl:length 
from a previous integer field. I think that's very counter-intuitive.

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 246742






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20091118/4e80e0d7/attachment.html 


More information about the dfdl-wg mailing list