[DFDL-WG] Using a discriminator on a simple type

Thu Jul 25 10:49:48 EDT 2013

Well, a type is not inherently an abstract reusable thing. That's just one
way to use them. A type can also be intended for reuse in a very
specialized situation only, as can a global element declaration (for reuse
as an element reference), or any reusable abstraction. This is true in
every programming language I know, and it is true of DFDL

I have created classes (just today in fact) that exist only share certain
complexity that only shows up in the type of one argument passed to a
constructor of one other class, and the whole thing is not used and not
intended for reuse anywhere else. I have also created classes that throw
specialized exception types, meaning they are intended for use only in
run-time contexts where the catch for those exceptions is around them.
These are analogous to DFDL types that contain discriminators, they are
intended for use only when a point-of-uncertainty surround them.

So, in DFDL....

The example is easy once you know the situation.

The situation might be called "deep tag". I.e., you distinguish the
alternatives in a choice based *not* on the first thing in them, but the
third thing in them. This third thing is always a simple type thing
(typically a string or int), it is always a discriminator, and it has some
shared characteristics that you want to write once, not repeat over and
over.

The simple type you define, it's whole purpose in life is to be type of
specialized 'tag-like" things that discriminate, and to express all their
common characteristics in one place.

<simpleType name="discTag">
   <annotation><appinfo source="http://www.ogf.org/dfdl/">
   <dfdl:simpleType lengthKind="pattern" />
   <dfdl:discriminator>{ fn:checkConstraints(.) }</dfdl:discriminator>
   <restriction base="xs:string">
         <minLength value="7"/>
         <maxLength value="12"/>
   <restriction>
</simpleType>

You reuse it like this:

<element name="type1Tag" dfdl:lengthPattern="A2\\d*\\D*|78569\D\D"
type="tns:discTag"/>

Bunch of those for the various tags.I have used this idiom, where the only
thing the element needs to express is a lengthPattern (or a length in other
situations) more than once now. Nice for them to be one-liner element
declarations like this so that they are densely packed in the schema xsd
file, making it easy to focus on the differences in those regexs.  This
works because a non-matching lengthPattern will cause a string of length 0,
which will cause the check-constraints in the discriminator to fail. There
are other ways to achieve this behavior (e.g., a discriminator with
testKind pattern), but nothing I tried achieved the economy of expression
of this idiom.

Then we bury/hide the tags using a double-group technique:

<group name="hType1Tag">
<sequence>
  <element ref="type1Tag"/>
</sequence>
</group>

<group name="type1Tag">
  <xs:sequence dfdl:hiddenGroupRef="hType1Tag"/>
</group>

Then these appear in the complex types of the various say, choice
alternatives, which each would look like this.

<complexType name="recordType1">
<sequence>
    <....stuff before the 'tag'....>
    <group ref="type1Tag"/>
    <... stuff after the tag...>
</sequence>
</complexType>

Bunch of those for the various record types. Note that these record types
have no DFDL annotations on them, nor will the infoset for them contain any
artifacts of DFDL due to the hidden group trickery here.

So, from the above you can see that the simple type was intended for resuse
only in this idiom, and in fact so was this complex type, because those
don't make sense unless they appear inside a point-of-uncertainty, i.e.,
this complex type assumes one needs to discriminate it from other things
wherever it appears. This is true of anything that hides a discriminator.
You can't safely reuse this type in a context where you know for sure you
don't need to discriminate, because in that situation you would not
necessarily have a point of uncertainty surrounding it, and the
discriminator would end up discriminating the wrong thing.

I used this technique in a schema (still under development) for a
particular MIL-standard data format. I rather like it as an idiom.

...mike

On Wed, Jul 24, 2013 at 2:21 PM, Steve Hanson <smh at uk.ibm.com> wrote:

> I know we discussed this on Tuesday's call, but I am really struggling
> with why I would ever want to do it. It seems to me it is an accident
> waiting to happen. A type is an abstract reusable thing, I have no idea
> where it will end up being used, so it is positively dangerous for a
> discriminator to appear. What is it discriminating? Answer: any surrounding
> point of uncertainty, no matter where that might be.
>
> Please can someone provide me with a real use case for this?
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>

-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130725/aff93d31/attachment-0001.html>