[DFDL-WG] DFDL pointer & offset proposal

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Oct 16 17:42:17 EDT 2019


 I added some comments to your original document. Attached.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Wed, Sep 25, 2019 at 12:55 PM Bradd Kadlecik <braddk at us.ibm.com> wrote:

> Here's what I've put together regarding the pointer & offset proposal for
> the next meeting's review.
>
> *(See attached file: DFDL_Indirection.docx)*
>
> Regards,
>
> *Bradd Kadlecik*
> z/TPF Development
> ------------------------------
> *Phone:* 1-845-433-1573
> *E-mail:* *braddk at us.ibm.com* <braddk at us.ibm.com>
> 2455 South Rd
> Poughkeepsie, NY 12601-5400
> United States
>
>
> [image: Inactive hide details for Steve Hanson---04/05/2019 12:42:34
> PM---Regards]Steve Hanson---04/05/2019 12:42:34 PM---Regards
>
> From: Steve Hanson/UK/IBM
> To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
> Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
> michele.zundo at esa.int>
> Date: 04/05/2019 12:42 PM
> Subject: Re: Latest OGF DFDL WG Call Minutes
>
> ------------------------------
>
>
> *No*
> *Action *
>
> *309*
> *Create example scenarios to illustrate offset & pointer requirements
> (Bradd)*
> 5/4/19: Daffodil have a draft proposal for offset support, TPF have
> experimental implementation for pointer support. Need examples to show the
> requirement, especially unparsing.
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
> [image: Inactive hide details for Steve Hanson---05/04/2019 10:29:24---I
> can see a difference between offsets and pointers. If I follo]Steve
> Hanson---05/04/2019 10:29:24---I can see a difference between offsets and
> pointers. If I follow an offset and parse an element x t
>
> From: Steve Hanson/UK/IBM
> To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
> Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
> michele.zundo at esa.int>
> Date: 05/04/2019 10:29
> Subject: Re: Latest OGF DFDL WG Call Minutes
> ------------------------------
>
>
> I can see a difference between offsets and pointers. If I follow an offset
> and parse an element x then I won't automatically jump back to where I was
> - the next element y I parse will continue from location offset + length x
> unless I use offset again to jump back to the original location. If I
> follow a pointer and parse an element x, then the next element y I parse
> will continue from original location + length(pointer).
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
> [image: Inactive hide details for Steve Hanson---04/04/2019
> 09:42:36---This is a proposal from the Daffodil team for offset support, ne]Steve
> Hanson---04/04/2019 09:42:36---This is a proposal from the Daffodil team
> for offset support, needed for formats like TIFF and if we
>
> From: Steve Hanson/UK/IBM
> To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
> Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
> michele.zundo at esa.int>
> Date: 04/04/2019 09:42
> Subject: Re: Latest OGF DFDL WG Call Minutes
> ------------------------------
>
>
> This is a proposal from the Daffodil team for offset support, needed for
> formats like TIFF and if we ever want to be able to handle zip files.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382
>
> I think this proposal can implement your requirement - the dfdl:offset
> property can be an expression that refers to another element (your pointer
> element), which can be hidden so as not to appear in the infoset. I think
> what you are proposing is a more convenient way of handling offsets that
> are defined dynamically in the data, as opposed to defined statically with
> fixed values (though as I said below I need unparsing explained). But I may
> be mis-understanding your use cases.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
> [image: Inactive hide details for Bradd Kadlecik---03/04/2019
> 16:50:13---Ok, I'll work on putting a scenario together for various point]Bradd
> Kadlecik---03/04/2019 16:50:13---Ok, I'll work on putting a scenario
> together for various pointer setups (array, complex element, str
>
> From: Bradd Kadlecik/Poughkeepsie/IBM
> To: Steve Hanson/UK/IBM at IBMGB
> Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
> michele.zundo at esa.int>
> Date: 03/04/2019 16:50
> Subject: Re: Latest OGF DFDL WG Call Minutes
> ------------------------------
>
>
> Ok, I'll work on putting a scenario together for various pointer setups
> (array, complex element, string) and show how it looks for both the JSON
> and the binary.
>
> I'm currently at a conference this week but will be returning late
> Thursday so expect to be available for the call Friday.
>
> I presented the TPF specific pointer implementation at the conference this
> week and there are some that will be trying to use it soon.
>
> Regards,
>
> *Bradd Kadlecik*
> z/TPF Development
> ------------------------------
> *Phone:* 1-845-433-1573
> *E-mail:* *braddk at us.ibm.com* <braddk at us.ibm.com>
> 2455 South Rd
> Poughkeepsie, NY 12601-5400
> United States
>
>
>
> [image: Inactive hide details for Steve Hanson---04/03/2019 10:29:25
> AM---Hi Bradd I have a few questions please ... inline below.]Steve
> Hanson---04/03/2019 10:29:25 AM---Hi Bradd I have a few questions please
> ... inline below.
>
> From: Steve Hanson/UK/IBM
> To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
> Cc: dfdl-wg at ogf.org, "Mike Beckerle" <mbeckerle at tresys.com>, "Michele
> Zundo" <michele.zundo at esa.int>
> Date: 04/03/2019 10:29 AM
> Subject: Re: Latest OGF DFDL WG Call Minutes
> ------------------------------
>
>
> Hi Bradd
>
> I have a few questions please ... inline below.
>
> I'm still going to need a real worked example, starting with some actual
> data and its schema, what it appears like after paring in the infoset, and
> how unparsing lays it back out again.
>
> Are you able to make the rescheduled call this Friday?
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
>
>
> [image: Inactive hide details for Bradd Kadlecik---27/02/2019
> 22:22:48---Regarding proposal for offsets and pointers: The following are]Bradd
> Kadlecik---27/02/2019 22:22:48---Regarding proposal for offsets and
> pointers: The following are the properties to be defined:
>
> From: Bradd Kadlecik/Poughkeepsie/IBM
> To: Steve Hanson/UK/IBM at IBMGB
> Cc: dfdl-wg at ogf.org, "Mike Beckerle" <mbeckerle at tresys.com>, "Michele
> Zundo" <michele.zundo at esa.int>
> Date: 27/02/2019 22:22
> Subject: Re: Latest OGF DFDL WG Call Minutes
> ------------------------------
>
>
> Regarding proposal for offsets and pointers:
>
> The following are the properties to be defined:
>
> SMH: I was expecting to see these properties only on dfdl:element,
> especially as you say '...the element contents... ?
>
> indirectKind Enum
> Valid values 'pointer', 'offset' (there is also a thought of objectId or
> refId for handling BSON but not at this time)
> Specifies the type of indirection used to access the element contents in
> the data stream.
> Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
> dfdl:group
>
> SMH: I am missing the distinction between offset and pointer. Is one
> relative to current position and the other relative to start of bitstream?
>
> SMH: In earlier DFDL proposals for offset support, we had used the term to
> refer to a property to be used to establish position of the *current*
> element instead of assuming the current element followed straight after the
> previous one. It would allow sparse modelling of fixed structures. The
> offset could be relative to start of bitstream or some other point. I don't
> think that's what you mean when you say 'offset' so I will refer to your
> new concept as 'pointer'.
>
> SMH: Assuming that indirectKind is a normal DFDL property, it can be in
> scope. It would therefore need to have an enum 'None' which would be the
> default used in most schemas.
>
> indirectLength Non-negative Integer or DFDL expression
> Specifies the length of the indirection in units according to the
> indirectUnits property.
> Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
> dfdl:group
>
> indirectUnits Enum
> Valid values 'bytes','bits'
> Specifies the units to be used for reading or writing the indirection
> according to indirectLength.
> The default value is 'bytes'.
> Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
> dfdl:group
>
> SMH: I think a better approach is to provide a property dfdl:indirectType,
> instead of indirectLength/indirectUnits, which refers to a simple type (not
> element) that carries its own lengthKind, length & lengthUnits properties.
> Similar idea to dfdl:prefixLengthType. That allows a lot of flexibility on
> how the pointer can appear.
>
> offsetBase non-empty string containing an absolute or relative XPath
> expression for the base element.
> Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
> dfdl:group
>
> The proposal would be to have the contents of the indirection be after the
> LeadingAlignment and before the TrailingAlignment. This would mean the
> aligment and skip factors apply to the indirection values in the data
> stream instead of the the contents of the indirection.
>
> SMH: Agree.
>
> This also then means in an array element, each element has its own
> indirection value (pointer or offset) and the alignment,skip factors then
> apply to each of these indirection values.
>
> SMH: Do you mean '...each occurrence...' ?
>
> It would be thought that the indirection values apply only to the data
> stream and not the infoset. During parse when the infoset is populated from
> the data stream, the indirection values are replaced by the contents.
> During unparse, the indirection values don't exist in the infoset and are
> created during the writing to/creation of the data stream.
>
> SMH: I agree that the indirection should be a purely physical thing, but I
> am not clear how the value is filled in when unparsing. Where does the
> value come from? outputValueCalc? Or maybe it's not needed when unparsing,
> and the data is always contiguous?
>
> For pointers, a null pointer creates the scenario of either nil
> representation or empty representation depending on whether or not nillable
> is defined as true. Unless default values (or 0 occurrence) are defined for
> all underlying content, then this is a processing error. During unparse,
> the only scenario in which a null pointer would be created is for a nil
> representation.
>
> SMH: This needs more thought. The nil & default properties apply to the
> *contents* of the indirection, not to the pointer. If you want to give a
> nil semantic to the pointer value itself, then that would require a new
> enum for dfdl:nilKind. I don't see why a pointer value 0 can't be treated
> like any other indirection value. A missing pointer is an error - it must
> be present - there is no way to control optionality because
> minOccurs/maxOccurs apply to the contents. (Alternatively, if you want the
> concepts of nil, default, occurs to apply to the indirect value, then
> dfdl:indirectType could point at an element instead of a simple type - but
> that seems way too over engineered).
>
> Examples:
> The following is the definition for the address of a null-terminated
> string in which the string address may be NULL as indicated by a nillable
> value of true:
> <xs:element name="myString" type="xs:string" dfdl:lengthKind="delimited"
> dfdl:encoding="UTF-8" dfdl:terminator="%NUL;" dfdl:indirectKind="pointer"
> dfdl:indirectLength="8" dfdl:indirectUnits="bytes" nillable="true" />
>
> The following is the definition for an array of three 4 byte addresses of
> a complex element defined by ns0:myStruct:
> <xs:element name="myArray" type="ns0:myStruct" dfdl:lengthKind="implicit"
> dfdl:indirectKind="pointer" dfdl:indirectLength="4"
> dfdl:indirectUnits="bytes" minOccurs="3" maxOccurs="3"
> dfdl:occursCountKind="fixed" />
>
> The following is the definition for a 4 byte offset to a 100 byte
> hexBinary value from the start of the parent element definition:
> <xs:element name="myData" type="xs:hexBinary" dfdl:lengthKind="explicit"
> dfdl:length="100" dfdl:lengthUnits="bytes" dfdl:indirectKind="offset"
> dfdl:indirectLength="4" dfdl:indirectUnits="bytes" dfdl:offsetBase=".." />
>
> SMH: I don't see how unparsing works. What provides the value?
>
> The proposal would also allow for the following optional item but I don't
> currently see a need for this:
> dfdl:offsetKind with values "startToStart" or "endToStart" - indicates if
> the offset is from the start of the base element or the end of the base
> element.
>
> I tried getting this out before my vacation so it might take a little bit
> to respond for issues. Thank you for your time.
>
>
> Regards,
>
> *Bradd Kadlecik*
> z/TPF Development
> ------------------------------
> *Phone:* 1-845-433-1573
> *E-mail:* *braddk at us.ibm.com* <braddk at us.ibm.com>
> 2455 South Rd
> Poughkeepsie, NY 12601-5400
> United States
>
>
>
> [image: Inactive hide details for Steve Hanson---02/07/2019 12:32:26
> PM---Please find minutes from the latest call at https://redmine.o]Steve
> Hanson---02/07/2019 12:32:26 PM---Please find minutes from the latest call
> at https://redmine.ogf.org/projects/dfdl-wg/newsRegards Ste
>
> From: Steve Hanson/UK/IBM
> To: dfdl-wg at ogf.org
> Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
> michele.zundo at esa.int>, Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
> Date: 02/07/2019 12:32 PM
> Subject: Latest OGF DFDL WG Call Minutes
> ------------------------------
>
>
> Please find minutes from the latest call at
> *https://redmine.ogf.org/projects/dfdl-wg/news*
> <https://redmine.ogf.org/projects/dfdl-wg/news>
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration
> Architect, IBM DFDL,
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20191016/1b107906/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20191016/1b107906/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DFDL_Indirection-mikeb-comments.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 15623 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20191016/1b107906/attachment-0001.docx>


More information about the dfdl-wg mailing list