[dfdl-wg] MikeB's random notes from F2F

Wed Dec 8 10:46:18 CST 2004

This is not a meeting summary. I'll work on that on the plane today and send
it subsequently, but people asked about having these notes. This is truly
random note taking.

...mikeb

---------------------------------------------------------------

Starting Points:

* xml xsd - what I already have, but not what I want
        - what I want

* Cobol/C-struct + system and compiler details

* data dictionary (ad-hoc spreadsheet or text description)

* example data files only

Some degree of structural similarity required. Reasonably compatible.

You could express any trasnformation at all, but the intent is to make
easy those where structural similarity is present.

Q: is this just a taste and style issue?

------------------------------------------------

- desirable to have symmetric read/write capability given a DFDL descriptor
- all built-in types and reps should implement both directions
- choices and use of runtimeValue expressions can create non-invertible
parsers
- mechanisms can allow explicit introduction of the output formatting
properties needed symmetric with the input parsing properties

-----------------------------------------------

Ways to refer to one element from the dfdl annotations of another:
1) use value of other element in runtimeValue expression.
2) use value of other element as source for dfdl read conversion
3) use value of other element as a parameter for dfdl read conversion
(note 2 and 3 are the same if the source is just another parameter)

-----------------------------------------------

Hypothesis: no XSD syntax is needed inside DFDL rep annotations. Can instead
reference a type name, and name elements within it.

---------------------------------------------

Choice groups imply the need to have an additional element to provide a name
for the choice. This is required only when the alternatives of the choice
contain a single value.

--------------------------------------------

Key topic: bining and passing mechanisms for property values aka parmeters
to read/write conversions

Parameterization and binding examples

1) Mime type - image format
   logical model looks like bmp
   black box read conversion

2) complex number with 2 possible component order, realFirst or
imaginaryFirst
   white box
   complexComponentOrder is the parameter

-------------------------------------------------------

Agreed: "transforms" will be called readers and writers, collectively
converters and conversions

--------------------------------------------------------

Proposal for parking lot: discontiguous representations. E.g., file full of
variable length strings where the length fields are all first, then all the
contents.

--------------------------------------------------------

XSLT - has variables and things we can use as constructs. E.g., they use
this idiom 

<xsl:value-of name="variable" select="...."/> and equivalently <xsl:value-of
name="variable">....</xsl:value-of>

---------------------------------------------------------

Issue: when annotations are added on to an element, can we validate that
only relevant properties are asserted for that element?
Is it desirable to insure that only relevant properties are asserted, or
should irrelevant properties simply be ignored?

Position - rule out irrelevant attributes improves validity checking,
catches errors earlier.
E.g., I keep changing the byteOrder setting, but nothing is changing in the
data I'm reading (turns out it's because byte order is irrelevant, but if
nothing was checking that nothing would help you find that out.)

Position - tolerate irrelevant attributes improves flexibility (e.g., if you
change the overall representation, you don't have to edit all the other
properties that no longer apply. A single file of DFDL can capture
characteristicts of more than one representation (at least one text and one
binary flavor, though this doesn't generalize.)

-----------------------------------------------------------

Issue: parameterization of transforms

seems like the OMG DT model and the transform descriptions (alan's proposal)
are very very close conceptually, but exactly how isn't entirely clear.

-----------------------------------------------------------

Preprocessing

an attribute called source (and presumably another called target) 

----------------------------------------------------------

5 kinds of operations

reader
writer
filter
change filter

function

known signatures
we can chain them together

conceptually think of this as pull model, or perhaps the DFDL expressions
don't take any position on whether the implementation is pull or push. 

should be a way to create pull-model code in a programming language and use
it as an augmentation of the DFDL system.
could be ways to also adapt push model code, or other schemes like stateful
threads.

Where can these go in DFDL?

- readers and writers go on elements
- filters go on a special construct for creating sources or targets from
other sources or targets

I/O asymetries - using filters you are discarding information, so it affects
ability to exactly reproduce output.

Box and arrow diagrams using these function types can be used to provide a
semantics for DFDL.

-----------------------------------------------------

<element name="charstream" type="dfdl:sourceStream">
  <annotation><appinfo source="...">
   <dfdl:sourceStreamTD>
    <charset>utf-8</charset>
    <source>byteStream</source>
    <filter>bytesToChars</filter>
   </dfdl:sourceStreamTD>
  </appinfo></annotation>
</element>

<element name="s" type="dfdl:sourceStream">
  <annotation><appinfo source="...">
   <dfdl:sourceStreamTD>
    <filter>replaceRegexp("...regexp for C-comments...", "")</filter>
    <source>charstream</source>>
   </dfdl:sourceStreamTD>
</appinfo></annotation>
</element>

<element name="t" type="dfdl:targetStream">
  <annotation><appinfo source="...">
   <dfdl:targetStreamTD>
    <charset>utf-8</charset>
    <target>outbyteStream</target>
    <filter>charsToBytes</filter>
   </dfdl:targetStreamTD>
  </appinfo></annotation>
</element>

<element name="toplevel">
  <annotation><appinfo source="...">
   <dfdl:instanceTD>
    <source>s</source>
    <target>t</target>
    <repType>text</repType>
   </dfdl:instanceTD>
  </appinfo></annotation>
  <sequence>
      <element name="len" type="int">
         <annotation><appinfo source="...">
           <intTD>
            <terminator>\p{newline}</terminator>
           </intTD>
         </appinfo></anntation>
      </element>
      <element name="val" type="int" minOccurs="0" maxOccurs="unbounded">
         <annotation><appinfo source="...">
            <intTD>
              <arrayTD>
                <storedLength>../len</storedLength>
                <terminator>\p{newline}</terminator>
                <separator>\p{space}</separator>
              </arrayTD>
              <numbase>10</numBase>
              <reader name="myIntReader">
                <numberOfBits>13</numberOfBits>
              </reader>
            </intTD>
         </appinfo></anntation>
      </element>
  </sequence>
</element>

--------------------------------------------------------

Still open issues:

1) scoping of property definitions. Useful or source of bad interactions?

2) how to organize model of the properties for the types - suman and mike in
rough agreement.

3) 

----------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20041208/76ef1b47/attachment.htm