[DFDL-WG] dfdl-wg Digest, Vol 43, Issue 2

Tim Kimber KIMBERT at uk.ibm.com
Wed Mar 3 06:52:09 CST 2010


I need to attend the sprint planning meeting from 2pm GMT, so I may not be 
around when these items are discussed. Comments added below:

12.2 Delimiters: Text Markup 
- The term 'Delimiters' is  not accurate. Most readers will not think of 
an initiator as a 'delimiter'. 
- It's not 'Text' markup any more - especially since v0.39 has allowed 
lengthKind="delimited" for elements with binary representation. 
Title should be 'Markup' and explanation can then deal with what it really 
is, rather than justifying the innaccurate title :-) 


I dislike the use of the term "markup" for something not written by 
people, and most data formats of the DFDL kind are written by computers, 
so nothing is getting "marked up" by anyone.

Initiators certainly are delimiters in the situations where they are not 
tags. I.e., initiator="[" terminator="]". Only tags will not be thought of 
as delimiters. Even then I think it is a stretch to say that nobody will 
think of an introductory tag as a delimiter. 

These definitions found online: 

de·lim·it·er   (d?-l?m'?-t?r)    
n.   Computer Science
A character or sequence of characters marking the beginning or end of a 
unit of data.

Computing Dictionary
delimiter character
A character or string used to separate, or mark the start and end of, 
items of data in, e.g., a database, source code, or text file.
See also: record.
(2001-03-16)
These definitions are consistent with our usage of the term. 

I suggest no change in our terminology here. 
 
<TK>
Point taken re: the term 'delimiter'. I still have reservations about 
calling it 'Text Markup' in the title, though. I think the intro paragraph 
should explain the common usage ( intiators, separators, terminators for 
text formats ) and the exceptional usage ( handling delimited binary data 
and other non-text markup )
</TK>

Syntax for specifying markup: 
It's not clear from this description that each item in the space-separated 
list is a DFDL string literal. 

These have always bugged me. Any better solution is welcome. XML/XSD does 
tend to make space separated the standard way to specify more than one 
thing.
 
<TK>
  In a future revision of the spec we need a list of property value types 
which can then be used consistently in the 
  tables which describe properties.
  - Enumeration
  - DFDL string literal
  - List of DFDL string literals
  - DFDL expression
  - DFDL regular expression
  - Boolean
  - Non-negative integer
  - any more?
 
  In some cases it will be necessary to place restrictions on the type of 
content allowed in the string literal
  ( disallow raw byte values / raw byte values must represent a character 
/ etc )
</TK

initiator ( and all other space-separated properties ) 
It is not clear whether the order of the space-separated properties 
matters. Must the parser test them in the order in which they are 
specified? 
( Q: What if %ES; is the first in the list? ) 

I think the order should not matter, and it should test them longest 
first.

<TK>
Good idea. 
I have another related suggestion below.
</TK> 

terminator: 
is it OK if the final terminator is missing within the scope of a 
known-length parent? Seems like a reasonable extension of the rule ( in 
all other scenarios, the end of a known-length parent acts like the end of 
the data stream for items with its scope ). 

I believe this should be true. "Final" is relative in my mind. 
 
<TK>
Good - it's much easier to implement if end of known length parent is 
always equivalent to end of data stream, from the point of view of 
enclosed elements.
But see next point...
</TK>

documentFinalTerminatorCanBeMissing: 
Let's try to avoid creating another property for the postfix separator 
scenario. I think this property provides a way of modelling the data 
naturally. 
We can recommend use of infix-with-a-terminator rather than 'postfix' if 
the final terminator can be missing. 

Copasetic. 
 
<TK>
Had to look up 'copasetic'. I'm amazed that my Mum never came out with 
that one - she's a walking dictionary.

This property has caused problems with naming and interpretation all along 
the line. Last time we discussed it, I don't
think we considered this option ( we did talk about something like it ):

- If %ES; is included in the list of values for separator or terminator 
then 
a) The parser ignores it while performing ordinary scanning ( otherwise it 
would always cause a zero-length string to be scanned ). 
b) The parser accepts 'end of data stream' as a match for the %ES; 
mnemonic. That makes this property ( and the equivalent one for separators 
) redundant.
c) Other usages of %ES; remain unchanged.
</TK>

outputNewLine 
Should we validate that the 'characterOrCharacters' are all newline 
characters from the set described by the %NL; mnemonic? Otherwise the DFDL 
serializer will output data which cannot be parsed by the DFDL parser. 


Nice catch.
 
dfdl:lengthKind endOfParent 
'endOfParent' has almost the same meaning as 'delimited' so should have 
the same semantics. 
·        the item?s terminator (if specified) 
·        an enclosing construct?s separator or terminator 
·        the end of an enclosing construct designated by its known length 
·        the end of the data stream 
The effect would be the the element could be ended by the nearest known 
length parent not just the immediate parent. Also the immediate parent 
could have lengthKind 'implicit' 


Agreed.
 
choiceKind 'Fixed' 
When lengthKind='implicit' all alternative branches of the choice are 
padded to the fixed length of the largest one so that overall the entire 
choice construct is fixed length 

There must be a restriction that the length of at least one choice must be 
statically defined. 

Also good catch.
 

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 246742






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100303/8d2dfcc6/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 236 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20100303/8d2dfcc6/attachment.gif 


More information about the dfdl-wg mailing list