[DFDL-WG] DFDL pointer & offset proposal

Bradd Kadlecik braddk at us.ibm.com
Thu Dec 12 10:40:28 EST 2019


Updated with comments:

(See attached file: DFDL_Indirection_v2.docx)


                                                                                   
 Regards,                                                                          
                                                                                   
 Bradd Kadlecik                                                                    
 z/TPF Development                                                                 
                                                                                   
                                                                                    
                                                                                    
                                                                                    
 Phone: 1-845-433-1573                                                2455 South Rd 
 E-mail: braddk at us.ibm.com                                         Poughkeepsie, NY 
                                                                         12601-5400 
                                                                      United States 
                                                                                    







From:	Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:	Bradd Kadlecik <braddk at us.ibm.com>
Cc:	DFDL-WG <dfdl-wg at ogf.org>
Date:	10/16/2019 05:43 PM
Subject:	[EXTERNAL] Re: [DFDL-WG] DFDL pointer & offset proposal



 I added some comments to your original document. Attached.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy



On Wed, Sep 25, 2019 at 12:55 PM Bradd Kadlecik <braddk at us.ibm.com> wrote:
  Here's what I've put together regarding the pointer & offset proposal for
  the next meeting's review.

  (See attached file: DFDL_Indirection.docx)



                                                                                   
 Regards,                                                                          
                                                                                   
 Bradd Kadlecik                                                                    
 z/TPF Development                                                                 
                                                                                   


                                                                                    
                                                                                    
                                                                                    
 Phone: 1-845-433-1573                                                2455 South Rd 
 E-mail: braddk at us.ibm.com                                         Poughkeepsie, NY 
                                                                         12601-5400 
                                                                      United States 
                                                                                    




  Inactive hide details for Steve Hanson---04/05/2019 12:42:34 PM---Regards
  Steve Hanson---04/05/2019 12:42:34 PM---Regards

  From: Steve Hanson/UK/IBM
  To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
  Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
  michele.zundo at esa.int>
  Date: 04/05/2019 12:42 PM
  Subject: Re: Latest OGF DFDL WG Call Minutes



|-+-----------------------------------------------------------------------|
|N|Action                                                                 |
|o|                                                                       |
|-+-----------------------------------------------------------------------|
|3|Create example scenarios to illustrate offset & pointer requirements   |
|0|(Bradd)                                                                |
|9|5/4/19: Daffodil have a draft proposal for offset support, TPF have    |
| |experimental implementation for pointer support. Need examples to show |
| |the requirement, especially unparsing.                                 |
|-+-----------------------------------------------------------------------|



  Regards

  Steve Hanson


  IBM Hybrid Integration, Hursley, UK
  Architect, IBM DFDL
  Co-Chair, OGF DFDL Working Group
  smh at uk.ibm.com
  tel:+44-1962-815848
  mob:+44-7717-378890
  Note: I work Tuesday to Friday


  Inactive hide details for Steve Hanson---05/04/2019 10:29:24---I can see
  a difference between offsets and pointers.  If I folloSteve
  Hanson---05/04/2019 10:29:24---I can see a difference between offsets and
  pointers. If I follow an offset and parse an element x t

  From: Steve Hanson/UK/IBM
  To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
  Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
  michele.zundo at esa.int>
  Date: 05/04/2019 10:29
  Subject: Re: Latest OGF DFDL WG Call Minutes




  I can see a difference between offsets and pointers. If I follow an
  offset and parse an element x then I won't automatically jump back to
  where I was - the next element y I parse will continue from location
  offset + length x unless I use offset again to jump back to the original
  location. If I follow a pointer and parse an element x, then the next
  element y I parse will continue from original location + length(pointer).

  Regards

  Steve Hanson


  IBM Hybrid Integration, Hursley, UK
  Architect, IBM DFDL
  Co-Chair, OGF DFDL Working Group
  smh at uk.ibm.com
  tel:+44-1962-815848
  mob:+44-7717-378890
  Note: I work Tuesday to Friday


  Inactive hide details for Steve Hanson---04/04/2019 09:42:36---This is a
  proposal from the Daffodil team for offset support, neSteve
  Hanson---04/04/2019 09:42:36---This is a proposal from the Daffodil team
  for offset support, needed for formats like TIFF and if we

  From: Steve Hanson/UK/IBM
  To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
  Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
  michele.zundo at esa.int>
  Date: 04/04/2019 09:42
  Subject: Re: Latest OGF DFDL WG Call Minutes




  This is a proposal from the Daffodil team for offset support, needed for
  formats like TIFF and if we ever want to be able to handle zip files.

  https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382

  I think this proposal can implement your requirement - the dfdl:offset
  property can be an expression that refers to another element (your
  pointer element), which can be hidden so as not to appear in the infoset.
  I think what you are proposing is a more convenient way of handling
  offsets that are defined dynamically in the data, as opposed to defined
  statically with fixed values (though as I said below I need unparsing
  explained). But I may be mis-understanding your use cases.

  Regards

  Steve Hanson


  IBM Hybrid Integration, Hursley, UK
  Architect, IBM DFDL
  Co-Chair, OGF DFDL Working Group
  smh at uk.ibm.com
  tel:+44-1962-815848
  mob:+44-7717-378890
  Note: I work Tuesday to Friday


  Inactive hide details for Bradd Kadlecik---03/04/2019 16:50:13---Ok, I'll
  work on putting a scenario together for various pointBradd
  Kadlecik---03/04/2019 16:50:13---Ok, I'll work on putting a scenario
  together for various pointer setups (array, complex element, str

  From: Bradd Kadlecik/Poughkeepsie/IBM
  To: Steve Hanson/UK/IBM at IBMGB
  Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
  michele.zundo at esa.int>
  Date: 03/04/2019 16:50
  Subject: Re: Latest OGF DFDL WG Call Minutes




  Ok, I'll work on putting a scenario together for various pointer setups
  (array, complex element, string) and show how it looks for both the JSON
  and the binary.

  I'm currently at a conference this week but will be returning late
  Thursday so expect to be available for the call Friday.

  I presented the TPF specific pointer implementation at the conference
  this week and there are some that will be trying to use it soon.
                                                                                   
 Regards,                                                                          
                                                                                   
 Bradd Kadlecik                                                                    
 z/TPF Development                                                                 
                                                                                   


                                                                                    
                                                                                    
                                                                                    
 Phone: 1-845-433-1573                                                2455 South Rd 
 E-mail: braddk at us.ibm.com                                         Poughkeepsie, NY 
                                                                         12601-5400 
                                                                      United States 
                                                                                    





  Inactive hide details for Steve Hanson---04/03/2019 10:29:25 AM---Hi
  Bradd I have a few questions please ... inline below.Steve
  Hanson---04/03/2019 10:29:25 AM---Hi Bradd I have a few questions
  please ... inline below.

  From: Steve Hanson/UK/IBM
  To: Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
  Cc: dfdl-wg at ogf.org, "Mike Beckerle" <mbeckerle at tresys.com>, "Michele
  Zundo" <michele.zundo at esa.int>
  Date: 04/03/2019 10:29 AM
  Subject: Re: Latest OGF DFDL WG Call Minutes


  Hi Bradd

  I have a few questions please ... inline below.

  I'm still going to need a real worked example, starting with some actual
  data and its schema, what it appears like after paring in the infoset,
  and how unparsing lays it back out again.

  Are you able to make the rescheduled call this Friday?

  Regards

  Steve Hanson


  IBM Hybrid Integration, Hursley, UK
  Architect, IBM DFDL
  Co-Chair, OGF DFDL Working Group
  smh at uk.ibm.com
  tel:+44-1962-815848
  mob:+44-7717-378890
  Note: I work Tuesday to Friday


  Inactive hide details for Bradd Kadlecik---27/02/2019
  22:22:48---Regarding proposal for offsets and pointers: The following are
  Bradd Kadlecik---27/02/2019 22:22:48---Regarding proposal for offsets and
  pointers: The following are the properties to be defined:

  From: Bradd Kadlecik/Poughkeepsie/IBM
  To: Steve Hanson/UK/IBM at IBMGB
  Cc: dfdl-wg at ogf.org, "Mike Beckerle" <mbeckerle at tresys.com>, "Michele
  Zundo" <michele.zundo at esa.int>
  Date: 27/02/2019 22:22
  Subject: Re: Latest OGF DFDL WG Call Minutes




  Regarding proposal for offsets and pointers:

  The following are the properties to be defined:

  SMH: I was expecting to see these properties only on dfdl:element,
  especially as you say '...the element contents... ?

  indirectKind Enum
  Valid values 'pointer', 'offset' (there is also a thought of objectId or
  refId for handling BSON but not at this time)
  Specifies the type of indirection used to access the element contents in
  the data stream.
  Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
  dfdl:group

  SMH: I am missing the distinction between offset and pointer. Is one
  relative to current position and the other relative to start of
  bitstream?

  SMH: In earlier DFDL proposals for offset support, we had used the term
  to refer to a property to be used to establish position of the current
  element instead of assuming the current element followed straight after
  the previous one. It would allow sparse modelling of fixed structures.
  The offset could be relative to start of bitstream or some other point. I
  don't think that's what you mean when you say 'offset' so I will refer to
  your new concept as 'pointer'.

  SMH: Assuming that indirectKind is a normal DFDL property, it can be in
  scope. It would therefore need to have an enum 'None' which would be the
  default used in most schemas.

  indirectLength Non-negative Integer or DFDL expression
  Specifies the length of the indirection in units according to the
  indirectUnits property.
  Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
  dfdl:group

  indirectUnits Enum
  Valid values 'bytes','bits'
  Specifies the units to be used for reading or writing the indirection
  according to indirectLength.
  The default value is 'bytes'.
  Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
  dfdl:group

  SMH: I think a better approach is to provide a property
  dfdl:indirectType, instead of indirectLength/indirectUnits, which refers
  to a simple type (not element) that carries its own lengthKind, length &
  lengthUnits properties. Similar idea to dfdl:prefixLengthType. That
  allows a lot of flexibility on how the pointer can appear.

  offsetBase non-empty string containing an absolute or relative XPath
  expression for the base element.
  Annotation: dfdl:element, dfdl:simpleType, dfdl:choice, dfdl:sequence,
  dfdl:group

  The proposal would be to have the contents of the indirection be after
  the LeadingAlignment and before the TrailingAlignment. This would mean
  the aligment and skip factors apply to the indirection values in the data
  stream instead of the the contents of the indirection.

  SMH: Agree.

  This also then means in an array element, each element has its own
  indirection value (pointer or offset) and the alignment,skip factors then
  apply to each of these indirection values.

  SMH: Do you mean '...each occurrence...' ?

  It would be thought that the indirection values apply only to the data
  stream and not the infoset. During parse when the infoset is populated
  from the data stream, the indirection values are replaced by the
  contents. During unparse, the indirection values don't exist in the
  infoset and are created during the writing to/creation of the data
  stream.

  SMH: I agree that the indirection should be a purely physical thing, but
  I am not clear how the value is filled in when unparsing. Where does the
  value come from? outputValueCalc? Or maybe it's not needed when
  unparsing, and the data is always contiguous?

  For pointers, a null pointer creates the scenario of either nil
  representation or empty representation depending on whether or not
  nillable is defined as true. Unless default values (or 0 occurrence) are
  defined for all underlying content, then this is a processing error.
  During unparse, the only scenario in which a null pointer would be
  created is for a nil representation.

  SMH: This needs more thought. The nil & default properties apply to the
  contents of the indirection, not to the pointer. If you want to give a
  nil semantic to the pointer value itself, then that would require a new
  enum for dfdl:nilKind. I don't see why a pointer value 0 can't be treated
  like any other indirection value. A missing pointer is an error - it must
  be present - there is no way to control optionality because
  minOccurs/maxOccurs apply to the contents. (Alternatively, if you want
  the concepts of nil, default, occurs to apply to the indirect value, then
  dfdl:indirectType could point at an element instead of a simple type -
  but that seems way too over engineered).

  Examples:
  The following is the definition for the address of a null-terminated
  string in which the string address may be NULL as indicated by a nillable
  value of true:
  <xs:element name="myString" type="xs:string" dfdl:lengthKind="delimited"
  dfdl:encoding="UTF-8" dfdl:terminator="%NUL;" dfdl:indirectKind="pointer"
  dfdl:indirectLength="8" dfdl:indirectUnits="bytes" nillable="true" />

  The following is the definition for an array of three 4 byte addresses of
  a complex element defined by ns0:myStruct:
  <xs:element name="myArray" type="ns0:myStruct" dfdl:lengthKind="implicit"
  dfdl:indirectKind="pointer" dfdl:indirectLength="4"
  dfdl:indirectUnits="bytes" minOccurs="3" maxOccurs="3"
  dfdl:occursCountKind="fixed" />

  The following is the definition for a 4 byte offset to a 100 byte
  hexBinary value from the start of the parent element definition:
  <xs:element name="myData" type="xs:hexBinary" dfdl:lengthKind="explicit"
  dfdl:length="100" dfdl:lengthUnits="bytes" dfdl:indirectKind="offset"
  dfdl:indirectLength="4" dfdl:indirectUnits="bytes"
  dfdl:offsetBase=".." />

  SMH: I don't see how unparsing works. What provides the value?

  The proposal would also allow for the following optional item but I don't
  currently see a need for this:
  dfdl:offsetKind with values "startToStart" or "endToStart" - indicates if
  the offset is from the start of the base element or the end of the base
  element.

  I tried getting this out before my vacation so it might take a little bit
  to respond for issues. Thank you for your time.

                                                                                   
 Regards,                                                                          
                                                                                   
 Bradd Kadlecik                                                                    
 z/TPF Development                                                                 
                                                                                   


                                                                                    
                                                                                    
                                                                                    
 Phone: 1-845-433-1573                                                2455 South Rd 
 E-mail: braddk at us.ibm.com                                         Poughkeepsie, NY 
                                                                         12601-5400 
                                                                      United States 
                                                                                    





  Inactive hide details for Steve Hanson---02/07/2019 12:32:26 PM---Please
  find minutes from the latest call at https://redmine.oSteve
  Hanson---02/07/2019 12:32:26 PM---Please find minutes from the latest
  call at https://redmine.ogf.org/projects/dfdl-wg/newsRegards Ste

  From: Steve Hanson/UK/IBM
  To: dfdl-wg at ogf.org
  Cc: "Mike Beckerle" <mbeckerle at tresys.com>, "Michele Zundo" <
  michele.zundo at esa.int>, Bradd Kadlecik/Poughkeepsie/IBM at IBMUS
  Date: 02/07/2019 12:32 PM
  Subject: Latest OGF DFDL WG Call Minutes


  Please find minutes from the latest call at
  https://redmine.ogf.org/projects/dfdl-wg/news

  Regards

  Steve Hanson

  IBM Hybrid Integration
  Architect, IBM DFDL,
  Co-Chair, OGF DFDL Working Group
  smh at uk.ibm.com
  tel:+44-1962-815848
  mob:+44-7717-378890




  Unless stated otherwise above:
  IBM United Kingdom Limited - Registered in England and Wales with number
  741598.
  Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
  3AU

  Unless stated otherwise above:
  IBM United Kingdom Limited - Registered in England and Wales with number
  741598.
  Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
  3AU


  --
    dfdl-wg mailing list
    dfdl-wg at ogf.org
    https://www.ogf.org/mailman/listinfo/dfdl-wg[attachment
  "DFDL_Indirection-mikeb-comments.docx" deleted by Bradd
  Kadlecik/Poughkeepsie/IBM]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20191212/2abb9493/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20191212/2abb9493/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DFDL_Indirection_v2.docx
Type: application/octet-stream
Size: 25909 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20191212/2abb9493/attachment-0001.obj>


More information about the dfdl-wg mailing list