[DFDL-WG] DFDL pointer & offset proposal

Steve Hanson smh at uk.ibm.com
Thu Apr 30 10:46:28 EDT 2020


Comments for call today (yellow)



Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   "Bradd Kadlecik" <braddk at us.ibm.com>
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc:     DFDL-WG <dfdl-wg at ogf.org>
Date:   12/12/2019 15:42
Subject:        [EXTERNAL] Re: [DFDL-WG] DFDL pointer & offset proposal
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>



Updated with comments:

(See attached file: DFDL_Indirection_v2.docx)


Regards,

Bradd Kadlecik
z/TPF Development


Phone: 1-845-433-1573
E-mail: braddk at us.ibm.com
2455 South Rd
Poughkeepsie, NY 12601-5400
United States


Mike Beckerle ---10/16/2019 05:43:52 PM--- I added some comments to your 
original document. Attached. Mike Beckerle | OGF DFDL Workgroup Co-Ch

From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Bradd Kadlecik <braddk at us.ibm.com>
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 10/16/2019 05:43 PM
Subject: [EXTERNAL] Re: [DFDL-WG] DFDL pointer & offset proposal



 I added some comments to your original document. Attached. 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy



On Wed, Sep 25, 2019 at 12:55 PM Bradd Kadlecik <braddk at us.ibm.com> wrote: 

Here's what I've put together regarding the pointer & offset proposal for 
the next meeting's review.

(See attached file: DFDL_Indirection.docx)

Regards,

Bradd Kadlecik
z/TPF Development


Phone: 1-845-433-1573
E-mail: braddk at us.ibm.com
2455 South Rd
Poughkeepsie, NY 12601-5400
United States


Steve Hanson---04/05/2019 12:42:34 PM---Regards

From: Steve Hanson/UK/IBM
To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
michele.zundo at esa.int>
Date: 04/05/2019 12:42 PM
Subject: Re: Latest OGF DFDL WG Call Minutes

No
Action 
309
Create example scenarios to illustrate offset & pointer requirements 
(Bradd)
5/4/19: Daffodil have a draft proposal for offset support, TPF have 
experimental implementation for pointer support. Need examples to show the 
requirement, especially unparsing.

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 


Steve Hanson---05/04/2019 10:29:24---I can see a difference between 
offsets and pointers. If I follow an offset and parse an element x t

From: Steve Hanson/UK/IBM
To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
michele.zundo at esa.int>
Date: 05/04/2019 10:29
Subject: Re: Latest OGF DFDL WG Call Minutes


I can see a difference between offsets and pointers. If I follow an offset 
and parse an element x then I won't automatically jump back to where I was 
- the next element y I parse will continue from location offset + length x 
unless I use offset again to jump back to the original location. If I 
follow a pointer and parse an element x, then the next element y I parse 
will continue from original location + length(pointer).

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 


Steve Hanson---04/04/2019 09:42:36---This is a proposal from the Daffodil 
team for offset support, needed for formats like TIFF and if we

From: Steve Hanson/UK/IBM
To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
michele.zundo at esa.int>
Date: 04/04/2019 09:42
Subject: Re: Latest OGF DFDL WG Call Minutes


This is a proposal from the Daffodil team for offset support, needed for 
formats like TIFF and if we ever want to be able to handle zip files.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382

I think this proposal can implement your requirement - the dfdl:offset 
property can be an expression that refers to another element (your pointer 
element), which can be hidden so as not to appear in the infoset. I think 
what you are proposing is a more convenient way of handling offsets that 
are defined dynamically in the data, as opposed to defined statically with 
fixed values (though as I said below I need unparsing explained). But I 
may be mis-understanding your use cases.

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 


Bradd Kadlecik---03/04/2019 16:50:13---Ok, I'll work on putting a scenario 
together for various pointer setups (array, complex element, str

From: Bradd Kadlecik/Poughkeepsie/IBM
To: Steve Hanson/UK/IBM at IBMGB
Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
michele.zundo at esa.int>
Date: 03/04/2019 16:50
Subject: Re: Latest OGF DFDL WG Call Minutes


Ok, I'll work on putting a scenario together for various pointer setups 
(array, complex element, string) and show how it looks for both the JSON 
and the binary.

I'm currently at a conference this week but will be returning late 
Thursday so expect to be available for the call Friday.

I presented the TPF specific pointer implementation at the conference this 
week and there are some that will be trying to use it soon.
Regards,

Bradd Kadlecik
z/TPF Development


Phone: 1-845-433-1573
E-mail: braddk at us.ibm.com
2455 South Rd
Poughkeepsie, NY 12601-5400
United States



Steve Hanson---04/03/2019 10:29:25 AM---Hi Bradd I have a few questions 
please ... inline below.

From: Steve Hanson/UK/IBM
To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
Cc: dfdl-wg at ogf.org, "Mike Beckerle" <mbeckerle at tresys.com>, "Michele 
Zundo" <michele.zundo at esa.int>
Date: 04/03/2019 10:29 AM
Subject: Re: Latest OGF DFDL WG Call Minutes


Hi Bradd

I have a few questions please ... inline below.

I'm still going to need a real worked example, starting with some actual 
data and its schema, what it appears like after paring in the infoset, and 
how unparsing lays it back out again.

Are you able to make the rescheduled call this Friday?

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 


Bradd Kadlecik---27/02/2019 22:22:48---Regarding proposal for offsets and 
pointers: The following are the properties to be defined:

From: Bradd Kadlecik/Poughkeepsie/IBM
To: Steve Hanson/UK/IBM at IBMGB
Cc: dfdl-wg at ogf.org, "Mike Beckerle" <mbeckerle at tresys.com>, "Michele 
Zundo" <michele.zundo at esa.int>
Date: 27/02/2019 22:22
Subject: Re: Latest OGF DFDL WG Call Minutes


Regarding proposal for offsets and pointers:

The following are the properties to be defined:

SMH: I was expecting to see these properties only on dfdl:element, 
especially as you say '...the element contents... ?

indirectKind Enum
Valid values 'pointer', 'offset' (there is also a thought of objectId or 
refId for handling BSON but not at this time)
Specifies the type of indirection used to access the element contents in 
the data stream.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, 
dfdl:group

SMH: I am missing the distinction between offset and pointer. Is one 
relative to current position and the other relative to start of bitstream? 


SMH: In earlier DFDL proposals for offset support, we had used the term to 
refer to a property to be used to establish position of the current 
element instead of assuming the current element followed straight after 
the previous one. It would allow sparse modelling of fixed structures. The 
offset could be relative to start of bitstream or some other point. I 
don't think that's what you mean when you say 'offset' so I will refer to 
your new concept as 'pointer'.

SMH: Assuming that indirectKind is a normal DFDL property, it can be in 
scope. It would therefore need to have an enum 'None' which would be the 
default used in most schemas.

indirectLength Non-negative Integer or DFDL expression
Specifies the length of the indirection in units according to the 
indirectUnits property.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, 
dfdl:group

indirectUnits Enum
Valid values 'bytes','bits'
Specifies the units to be used for reading or writing the indirection 
according to indirectLength.
The default value is 'bytes'.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, 
dfdl:group

SMH: I think a better approach is to provide a property dfdl:indirectType, 
instead of indirectLength/indirectUnits, which refers to a simple type 
(not element) that carries its own lengthKind, length & lengthUnits 
properties. Similar idea to dfdl:prefixLengthType. That allows a lot of 
flexibility on how the pointer can appear.

offsetBase non-empty string containing an absolute or relative XPath 
expression for the base element.
Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence, 
dfdl:group

The proposal would be to have the contents of the indirection be after the 
LeadingAlignment and before the TrailingAlignment. This would mean the 
aligment and skip factors apply to the indirection values in the data 
stream instead of the the contents of the indirection. 

SMH: Agree.

This also then means in an array element, each element has its own 
indirection value (pointer or offset) and the alignment,skip factors then 
apply to each of these indirection values. 

SMH: Do you mean '...each occurrence...' ?

It would be thought that the indirection values apply only to the data 
stream and not the infoset. During parse when the infoset is populated 
from the data stream, the indirection values are replaced by the contents. 
During unparse, the indirection values don't exist in the infoset and are 
created during the writing to/creation of the data stream.

SMH: I agree that the indirection should be a purely physical thing, but I 
am not clear how the value is filled in when unparsing. Where does the 
value come from? outputValueCalc? Or maybe it's not needed when unparsing, 
and the data is always contiguous? 

For pointers, a null pointer creates the scenario of either nil 
representation or empty representation depending on whether or not 
nillable is defined as true. Unless default values (or 0 occurrence) are 
defined for all underlying content, then this is a processing error. 
During unparse, the only scenario in which a null pointer would be created 
is for a nil representation.

SMH: This needs more thought. The nil & default properties apply to the 
contents of the indirection, not to the pointer. If you want to give a nil 
semantic to the pointer value itself, then that would require a new enum 
for dfdl:nilKind. I don't see why a pointer value 0 can't be treated like 
any other indirection value. A missing pointer is an error - it must be 
present - there is no way to control optionality because 
minOccurs/maxOccurs apply to the contents. (Alternatively, if you want the 
concepts of nil, default, occurs to apply to the indirect value, then 
dfdl:indirectType could point at an element instead of a simple type - but 
that seems way too over engineered). 

Examples:
The following is the definition for the address of a null-terminated 
string in which the string address may be NULL as indicated by a nillable 
value of true:
<xs:element name="myString" type="xs:string" dfdl:lengthKind="delimited" 
dfdl:encoding="UTF-8" dfdl:terminator="%NUL;" dfdl:indirectKind="pointer" 
dfdl:indirectLength="8" dfdl:indirectUnits="bytes" nillable="true" />

The following is the definition for an array of three 4 byte addresses of 
a complex element defined by ns0:myStruct:
<xs:element name="myArray" type="ns0:myStruct" dfdl:lengthKind="implicit" 
dfdl:indirectKind="pointer" dfdl:indirectLength="4" 
dfdl:indirectUnits="bytes" minOccurs="3" maxOccurs="3" 
dfdl:occursCountKind="fixed" />

The following is the definition for a 4 byte offset to a 100 byte 
hexBinary value from the start of the parent element definition:
<xs:element name="myData" type="xs:hexBinary" dfdl:lengthKind="explicit" 
dfdl:length="100" dfdl:lengthUnits="bytes" dfdl:indirectKind="offset" 
dfdl:indirectLength="4" dfdl:indirectUnits="bytes" dfdl:offsetBase=".." />

SMH: I don't see how unparsing works. What provides the value? 

The proposal would also allow for the following optional item but I don't 
currently see a need for this:
dfdl:offsetKind with values "startToStart" or "endToStart" - indicates if 
the offset is from the start of the base element or the end of the base 
element.

I tried getting this out before my vacation so it might take a little bit 
to respond for issues. Thank you for your time.

Regards,

Bradd Kadlecik
z/TPF Development


Phone: 1-845-433-1573
E-mail: braddk at us.ibm.com
2455 South Rd
Poughkeepsie, NY 12601-5400
United States



Steve Hanson---02/07/2019 12:32:26 PM---Please find minutes from the 
latest call at https://redmine.ogf.org/projects/dfdl-wg/newsRegards Ste

From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org
Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
michele.zundo at esa.int>, Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
Date: 02/07/2019 12:32 PM
Subject: Latest OGF DFDL WG Call Minutes


Please find minutes from the latest call at 
https://redmine.ogf.org/projects/dfdl-wg/news

Regards

Steve Hanson

IBM Hybrid Integration
Architect, IBM DFDL,
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg[attachment 
"DFDL_Indirection-mikeb-comments.docx" deleted by Bradd 
Kadlecik/Poughkeepsie/IBM] 

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ogf.org_mailman_listinfo_dfdl-2Dwg&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=MYpPc-jdXnVXY3_1Kw5oY709U0VxQqnxI85zrS5pMm0&s=ZK2nomg1VDUTwcDRdMB9JPthPb1Do5KdU0JIDT74vcE&e= 



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0008.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0009.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0010.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0011.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0012.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0013.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0014.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0015.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DFDL_Indirection_v2.docx
Type: application/octet-stream
Size: 25909 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DFDL_Indirection_v2- smh-comments.odt
Type: application/octet-stream
Size: 24546 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20200430/7e92bfc8/attachment-0003.obj>


More information about the dfdl-wg mailing list