[dfdl-wg] Draft of Properties document

Steve Hanson smh at uk.ibm.com
Thu Mar 23 10:21:49 CST 2006


The attached document is the first draft of the revised set of DFDL
properties. The following should be borne in mind when reviewing:

- I have divided the set of properties in two, those that are concerned
with physical representation of the data (rep properties), and those that
aren't (non-rep properties).

- A number of physical rep types have been identified (text, binaryStream,
binaryInteger, binaryFloat, zonedDecimal, packedDecimal,
binaryCodedDecimal, xml).

- XML Schema logical types have been gathered into type groups (number,
string, binary, boolean, calendar).  For each logical type group, only
certain physical rep types are allowed (eg, for string, only text & xml).

- There are a group of rep properties that are common to many physical rep
types (eg, length). The majority of rep properties are specific to a
physical rep type, I have decorated such property names accordingly (eg,
integerSigned). Some of these are further specific to logical type group, I
have further decorated such property names accordingly (eg,
textCalendarScheme).

- I have assumed for now that all physical rep types are part of the core
standard, we need to revisit this as clearly some of the decimal reps are
not universally used.

- I have not (yet) attempted to assign properties to conversions. I'd
rather we reviewed the properties first in terms of the general approach to
organisation, naming, and so on, before starting this exercise.

- The issue of defaults poses some questions. We have agreed that
hard-wired model defaults are not desirable as it means the absence of a
property implies a behaviour. If we wish to change this behaviour then we
are stuck because existing DFDL Schema then behave differently. So the
proposal in the scoping document is that all properties that are used
during a parse/serialise must have an explicit value defined somewhere in
the DFDL Schema, typically in a dfdl:defineFormat annotation. However,
there are properties where we don't want a one-value-fits-all value, but at
the same time do not want to specify a value on every element. Example:
justification, where typically all strings are left justfied, and all
numbers are right justified.  Example: calendarPattern, where the patterns
for a date, dateTime, time, monthDay, etc are different. One approach is to
duplicate the properties - so we would have textStringJustification,
textNumberJustification, and so on. You will see that this is what I have
done in the document for justification.  An alternative approach is to have
one property, but to add an additional enum 'schema', which means derive
the default from the logical type. You will see that this is what I have
done for calendarPattern. So calendarPatternKind set to 'schema' for
xsd:date would yield "yyyy-MM-dd" and for xsd:time would yield
"hh:mm:ss.sss". Of course, 'schema' can be considered a form of
hard-wiring, so maybe we also need some properties that define what these
defaults are (they'd only ever be set at dfdl:defineFormat level)?  We need
to decide which of these approaches is preferred, whether they both make
sense, or if there is a better way.

- I have not attempted to define properties for multi-dimensional arrays, I
have been leaving this until the tracker for this is resolved.

- I have not attempted to define properties for physical rep type xml. I
think we need a broader discussion on how XML is handled within DFDL first.

- Certain properties come with a whole bunch of related properties.
Examples are patterns for text numbers, patterns for text calendars,
separators, initiators, terminators, occurs. For the patterns I have
created schemes under which to group the related properties, like has been
proposed for escapes/quotes.  I have not done this for separators etc,
partly so you can see the two different approaches. We need to come up with
a consistent design for when to use schemes and when not to use schemes.

- For each property I've suggested which DFDL annotations are applicable,
this also requires checking.

- My escape scheme is deliberately simple, it may well need to be improved,
and we need to decide on whether alternate and/or nested schemes are
required.

- I've tried to be consistent with property names. For example, properties
that control how another property is interpreted is suffixed with 'kind'
(eg, length and lengthKind). Similarly with enum values that have the same
meaning across properties (eg, always, never, output, input) for properties
that say whether something is applicable to input and/or output.

- Mike, Geoff, Suman and myself spent some time prior to the F2F in
designing some of the properties. They may notice that I have deviated in
places from what was proposed. This is invariably due to discovery of some
scenario not covered by our previous discussion. For example, it turns out
we need to have a separate control for trimming fixed length text, instead
of deducing a behaviour from justification (because MRM has a little known
but useful property that controls this). They may also notice that not all
property behaviour has been stated. This is simply down to a desire to keep
the property descriptions concise at this stage.

- I've incorporated Geoff's latest thinking on default value and null value
properties, although the document describing the theory behind these has
not yet been reviewed by anyone other than Geoff and myself, and there some
open questions on things like null indicator fields. I expect there to be
considerable discussion in this area.

- For some properties I've indicated where they can take either a literal
value or an XPath. The list of such properties requires revision as I have
not been exhaustive in this, mainly allowing XPath where I know that IBM's
models allow it.

- Although I have tried to make sure that DFDL properties encompass IBM's
models, I can not guarantee this at this point in time. Review is required
by IBMers beyond the DFDL WG IBMers. I have initiated this process.

(See attached file: DFDL_Properties_v004.doc)

Regards, Steve

Steve Hanson
WebSphere Message Brokers,
IBM United Kingdom Ltd, Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DFDL_Properties_v004.doc
Type: application/msword
Size: 413696 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20060323/f255858d/attachment.doc 


More information about the dfdl-wg mailing list