[DFDL-WG] Latest OGF DFDL WG Call Minutes
Steve Hanson
smh at uk.ibm.com
Wed Apr 3 10:29:25 EDT 2019
Hi Bradd
I have a few questions please ... inline below.
I'm still going to need a real worked example, starting with some actual
data and its schema, what it appears like after paring in the infoset, and
how unparsing lays it back out again.
Are you able to make the rescheduled call this Friday?
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Bradd Kadlecik/Poughkeepsie/IBM
To: Steve Hanson/UK/IBM at IBMGB
Cc: dfdl-wg at ogf.org, "Mike Beckerle" <mbeckerle at tresys.com>, "Michele
Zundo" <michele.zundo at esa.int>
Date: 27/02/2019 22:22
Subject: Re: Latest OGF DFDL WG Call Minutes
Regarding proposal for offsets and pointers:
The following are the properties to be defined:
SMH: I was expecting to see these properties only on dfdl:element,
especially as you say '...the element contents... ?
indirectKind Enum
Valid values 'pointer', 'offset' (there is also a thought
of objectId or refId for handling BSON but not at this time)
Specifies the type of indirection used to access the
element contents in the data stream.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice,
dfdl:sequence, dfdl:group
SMH: I am missing the distinction between offset and pointer. Is one
relative to current position and the other relative to start of bitstream?
SMH: In earlier DFDL proposals for offset support, we had used the term to
refer to a property to be used to establish position of the current
element instead of assuming the current element followed straight after
the previous one. It would allow sparse modelling of fixed structures. The
offset could be relative to start of bitstream or some other point. I
don't think that's what you mean when you say 'offset' so I will refer to
your new concept as 'pointer'.
SMH: Assuming that indirectKind is a normal DFDL property, it can be in
scope. It would therefore need to have an enum 'None' which would be the
default used in most schemas.
indirectLength Non-negative Integer or DFDL expression
Specifies the length of the indirection in units according
to the indirectUnits property.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice,
dfdl:sequence, dfdl:group
indirectUnits Enum
Valid values 'bytes','bits'
Specifies the units to be used for reading or writing the
indirection according to indirectLength.
The default value is 'bytes'.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice,
dfdl:sequence, dfdl:group
SMH: I think a better approach is to provide a property dfdl:indirectType,
instead of indirectLength/indirectUnits, which refers to a simple type
(not element) that carries its own lengthKind, length & lengthUnits
properties. Similar idea to dfdl:prefixLengthType. That allows a lot of
flexibility on how the pointer can appear.
offsetBase non-empty string containing an absolute or relative XPath
expression for the base element.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice,
dfdl:sequence, dfdl:group
The proposal would be to have the contents of the indirection be after the
LeadingAlignment and before the TrailingAlignment. This would mean the
aligment and skip factors apply to the indirection values in the data
stream instead of the the contents of the indirection.
SMH: Agree.
This also then means in an array element, each element has its own
indirection value (pointer or offset) and the alignment,skip factors then
apply to each of these indirection values.
SMH: Do you mean '...each occurrence...' ?
It would be thought that the indirection values apply only to the data
stream and not the infoset. During parse when the infoset is populated
from the data stream, the indirection values are replaced by the contents.
During unparse, the indirection values don't exist in the infoset and are
created during the writing to/creation of the data stream.
SMH: I agree that the indirection should be a purely physical thing, but I
am not clear how the value is filled in when unparsing. Where does the
value come from? outputValueCalc? Or maybe it's not needed when
unparsing, and the data is always contiguous?
For pointers, a null pointer creates the scenario of either nil
representation or empty representation depending on whether or not
nillable is defined as true. Unless default values (or 0 occurrence) are
defined for all underlying content, then this is a processing error.
During unparse, the only scenario in which a null pointer would be created
is for a nil representation.
SMH: This needs more thought. The nil & default properties apply to the
contents of the indirection, not to the pointer. If you want to give a nil
semantic to the pointer value itself, then that would require a new enum
for dfdl:nilKind. I don't see why a pointer value 0 can't be treated like
any other indirection value. A missing pointer is an error - it must be
present - there is no way to control optionality because
minOccurs/maxOccurs apply to the contents. (Alternatively, if you want the
concepts of nil, default, occurs to apply to the indirect value, then
dfdl:indirectType could point at an element instead of a simple type - but
that seems way too over engineered).
Examples:
The following is the definition for the address of a null-terminated
string in which the string address may be NULL as indicated by a nillable
value of true:
<xs:element name="myString" type="xs:string" dfdl:lengthKind="delimited"
dfdl:encoding="UTF-8" dfdl:terminator="%NUL;" dfdl:indirectKind="pointer"
dfdl:indirectLength="8" dfdl:indirectUnits="bytes" nillable="true" />
The following is the definition for an array of three 4 byte addresses of
a complex element defined by ns0:myStruct:
<xs:element name="myArray" type="ns0:myStruct" dfdl:lengthKind="implicit"
dfdl:indirectKind="pointer" dfdl:indirectLength="4"
dfdl:indirectUnits="bytes" minOccurs="3" maxOccurs="3"
dfdl:occursCountKind="fixed" />
The following is the definition for a 4 byte offset to a 100 byte
hexBinary value from the start of the parent element definition:
<xs:element name="myData" type="xs:hexBinary" dfdl:lengthKind="explicit"
dfdl:length="100" dfdl:lengthUnits="bytes" dfdl:indirectKind="offset"
dfdl:indirectLength="4" dfdl:indirectUnits="bytes" dfdl:offsetBase=".." />
SMH: I don't see how unparsing works. What provides the value?
The proposal would also allow for the following optional item but I don't
currently see a need for this:
dfdl:offsetKind with values "startToStart" or "endToStart" - indicates
if the offset is from the start of the base element or the end of the base
element.
I tried getting this out before my vacation so it might take a little bit
to respond for issues. Thank you for your time.
Regards,
Bradd Kadlecik
z/TPF Development
Phone: 1-845-433-1573
E-mail: braddk at us.ibm.com
2455 South Rd
Poughkeepsie, NY 12601-5400
United States
From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org
Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo"
<michele.zundo at esa.int>, Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
Date: 02/07/2019 12:32 PM
Subject: Latest OGF DFDL WG Call Minutes
Please find minutes from the latest call at
https://redmine.ogf.org/projects/dfdl-wg/news
Regards
Steve Hanson
IBM Hybrid Integration
Architect, IBM DFDL,
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190403/87c53a54/attachment-0001.html>
More information about the dfdl-wg
mailing list