[DFDL-WG] dfdl-wg Digest, Vol 43, Issue 2
Tim Kimber
KIMBERT at uk.ibm.com
Wed Mar 3 06:52:09 CST 2010
I need to attend the sprint planning meeting from 2pm GMT, so I may not be
around when these items are discussed. Comments added below:
12.2 Delimiters: Text Markup
- The term 'Delimiters' is not accurate. Most readers will not think of
an initiator as a 'delimiter'.
- It's not 'Text' markup any more - especially since v0.39 has allowed
lengthKind="delimited" for elements with binary representation.
Title should be 'Markup' and explanation can then deal with what it really
is, rather than justifying the innaccurate title :-)
I dislike the use of the term "markup" for something not written by
people, and most data formats of the DFDL kind are written by computers,
so nothing is getting "marked up" by anyone.
Initiators certainly are delimiters in the situations where they are not
tags. I.e., initiator="[" terminator="]". Only tags will not be thought of
as delimiters. Even then I think it is a stretch to say that nobody will
think of an introductory tag as a delimiter.
These definitions found online:
de·lim·it·er (d?-l?m'?-t?r)
n. Computer Science
A character or sequence of characters marking the beginning or end of a
unit of data.
Computing Dictionary
delimiter character
A character or string used to separate, or mark the start and end of,
items of data in, e.g., a database, source code, or text file.
See also: record.
(2001-03-16)
These definitions are consistent with our usage of the term.
I suggest no change in our terminology here.
<TK>
Point taken re: the term 'delimiter'. I still have reservations about
calling it 'Text Markup' in the title, though. I think the intro paragraph
should explain the common usage ( intiators, separators, terminators for
text formats ) and the exceptional usage ( handling delimited binary data
and other non-text markup )
</TK>
Syntax for specifying markup:
It's not clear from this description that each item in the space-separated
list is a DFDL string literal.
These have always bugged me. Any better solution is welcome. XML/XSD does
tend to make space separated the standard way to specify more than one
thing.
<TK>
In a future revision of the spec we need a list of property value types
which can then be used consistently in the
tables which describe properties.
- Enumeration
- DFDL string literal
- List of DFDL string literals
- DFDL expression
- DFDL regular expression
- Boolean
- Non-negative integer
- any more?
In some cases it will be necessary to place restrictions on the type of
content allowed in the string literal
( disallow raw byte values / raw byte values must represent a character
/ etc )
</TK
initiator ( and all other space-separated properties )
It is not clear whether the order of the space-separated properties
matters. Must the parser test them in the order in which they are
specified?
( Q: What if %ES; is the first in the list? )
I think the order should not matter, and it should test them longest
first.
<TK>
Good idea.
I have another related suggestion below.
</TK>
terminator:
is it OK if the final terminator is missing within the scope of a
known-length parent? Seems like a reasonable extension of the rule ( in
all other scenarios, the end of a known-length parent acts like the end of
the data stream for items with its scope ).
I believe this should be true. "Final" is relative in my mind.
<TK>
Good - it's much easier to implement if end of known length parent is
always equivalent to end of data stream, from the point of view of
enclosed elements.
But see next point...
</TK>
documentFinalTerminatorCanBeMissing:
Let's try to avoid creating another property for the postfix separator
scenario. I think this property provides a way of modelling the data
naturally.
We can recommend use of infix-with-a-terminator rather than 'postfix' if
the final terminator can be missing.
Copasetic.
<TK>
Had to look up 'copasetic'. I'm amazed that my Mum never came out with
that one - she's a walking dictionary.
This property has caused problems with naming and interpretation all along
the line. Last time we discussed it, I don't
think we considered this option ( we did talk about something like it ):
- If %ES; is included in the list of values for separator or terminator
then
a) The parser ignores it while performing ordinary scanning ( otherwise it
would always cause a zero-length string to be scanned ).
b) The parser accepts 'end of data stream' as a match for the %ES;
mnemonic. That makes this property ( and the equivalent one for separators
) redundant.
c) Other usages of %ES; remain unchanged.
</TK>
outputNewLine
Should we validate that the 'characterOrCharacters' are all newline
characters from the set described by the %NL; mnemonic? Otherwise the DFDL
serializer will output data which cannot be parsed by the DFDL parser.
Nice catch.
dfdl:lengthKind endOfParent
'endOfParent' has almost the same meaning as 'delimited' so should have
the same semantics.
· the item?s terminator (if specified)
· an enclosing construct?s separator or terminator
· the end of an enclosing construct designated by its known length
· the end of the data stream
The effect would be the the element could be ended by the nearest known
length parent not just the immediate parent. Also the immediate parent
could have lengthKind 'implicit'
Agreed.
choiceKind 'Fixed'
When lengthKind='implicit' all alternative branches of the choice are
padded to the fixed length of the largest one so that overall the entire
choice construct is fixed length
There must be a restriction that the length of at least one choice must be
statically defined.
Also good catch.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100303/8d2dfcc6/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 236 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20100303/8d2dfcc6/attachment.gif
More information about the dfdl-wg
mailing list