[dfdl-wg] DFDL properties - points to discuss at F2F

Steve Hanson smh at uk.ibm.com
Sun Feb 26 13:27:58 CST 2006


I've been progressing the DFDL Properties tracker and several issues have
come up that need resolving before I can suggest a complete list of DFDL
properties.  Please have a think about these and we can discuss at the F2F.

1) When parsing, we want to be flexible and tolerate data appearing
different in forms. For example, allow initiators and terminators to be in
upper or lower case, allow UTC timezone to be +00:00 or Z, allow thousand
separators for decimal numbers, etc. But on output we must make a choice as
to which form we use. We need some principles that we apply to all
properties that exhibit this behaviour.

2) It looks like there are real-life examples of initiators and terminators
being in a different character encoding to the accompanying data. Implies
use of initiatorEncoding etc properties. Need to agree that this is the
correct way to proceed.

3) Property precedence and the use of 'notApplicable' enums to say that a
property is not to be interpreted on this object. Without this it is not
possible to override a dfdl:default setting from higher up the scope. For
example, how do I switch off the use of separator for a group when a
dfdl:default has set it higher up. A value of notApplicable would solve
that. Another example, I have a sequence where variable length fields have
a terminator but fixed length fields do not. My terminator and most of my
lengths are common, so I set both length and terminator at the group level
using dfdl:default. When property scoping rules are applied, I will have a
length and a terminator for each field, which do I use? I can't just say
length takes precedence because every field has been given a length. I need
a way of explicitly saying which to use. One approach is to use a special
value for length - notApplicable' - which means ignore the property, and I
would use that explicitly on all the variable length fields. Alternatively
I can be more explicit and have a separate property to control which is
used.

4) Use of 'Native' enum meaning use the locally defined value. Used for
character encodings, time zones, endian-ness.  The DFDL spec draft says:
"There should be no ‘platform varying defaults’. For example, byteOrder
should default to bigEndian, or littleEndian, or have no default at all (in
which case leaving it unspecified will often cause an error except in
all-text situations with non-endian character sets). What’s not acceptable
is for byteOrder to default to some value based on the current platform or
other environmental constraint. Similarly for locale-sensitive things".
Does this imply that 'Native' can't be the default?  I can see why we would
not want something to default silently based on platform/locale, but by
being clear that the default is 'Native' avoids that.

5) Agree on what offset facilities are to be offered for establishing the
position of an item. Absolute and/or relative. Problem with absolute
offsets is their fragility, a single variable length field breaks the
scheme unless the offset is given by an expression. Are relative offsets
all we really need? And what are they relative to? Last field?

6) Binary versus text. At the moment the Schema for DFDL has the text model
inheriting from the binary model. So text isn't really text, it's text or
binary. I think this is wrong and that repType=text and repType=binary
really imply separate semantics. It still means properties can be shared,
but it clarifies things like the behavior of built-in prefixed lengths - a
length prefix for a binary string would be a physical integer in the data,
a length prefix for a text string would be a physical numeric string in the
data.

7) Padding character - does setting this property imply trim on parse as
well as pad on output? Do we need a control for this (pad/trim/both)

8) Justification, date/time format, and some other properties have a
default that varies depending on the logical type of the element. How does
this default interact with setting a default via  scoping rules? We have
said that there are no hard-coded defaults, implying a dfdl:defineFormat
block must exist and must define values for all properties. This
contradicts deriving the default from the logical type. For such
properties, we could have an accompanying dfdl:defineFormat-only property
that says how it defaults. Or we could exempt these from having to have a
default set in dfdl:defineFormat.

9) There are many properties that are inherited from the CAM binary model.
Suman is establishing with IBM compiler people exactly what ones are needed
and what their semantics are. I find some of the names unhelpful and
suggest that renaming such properties is a good idea.

10) Decimal properties. There are a host of properties for controlling
decimal formats, it's not clear to me what some of them mean. I'd like us
to agree on a finished set.

11) Group level properties. We don't have many properties that define how
the members of a group behave. We have separator for instance that says all
child fields will be separated. Is there benefit in having other properties
like this, for example 'initiated' meaning all child fields must have a
unique initiator, or 'fixed' meaning all child fields must have a
pre-computable length. Such properties have a secondary benefits - they
communicate the nature of the group without recourse to examination of all
children, and they enable DFDL editors to validate that all intended
properties are present.

Regards, Steve

Steve Hanson
WebSphere Message Brokers,
IBM United Kingdom Ltd, Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848


More information about the dfdl-wg mailing list