[dfdl-wg] CSV string worked example
Jim Myers
jimmyers at ncsa.uiuc.edu
Wed Mar 1 09:06:27 CST 2006
>
>EOS is up for grabs I was thinking of it as a returned value (e.g.
>-1) but an exception might (or might not) be easier to make sense of.
Assuming -1 is a valid value and not used as a real value, etc. - I
think we need a separate mechanism to indicate the end
>
>Regarding the new model. I don't think this is a problem at the
>level of your example. We could simply use a single sequence and a
>more complex "split" conversion. I imagine that the "split"
>conversion we would want to settle on should accept a regular
>expression (or at least a list of separators). In your example you
>just have to allow the separator to be a new line OR a comma and you are done.
Yes but - I want to reuse the simple conversions and therefore keep
the ability to deal with all the variations you note below without
special constructs - if I consider this a sequence of two steps I can
deal with missing separators and terminators differently, if I put
the two together, I have to account for all the possible variations.
And I want to handle a data cube the same way without having to wait
for someone to build a new converter...(we talked this through
generally before - if you have a way to create new converters from
existing ones, you could support both ways).
>
>A note here this is intended as a rough sketch not a finished
>design. I am expecting the details to need to be worked out here. In
>particular I think Mike/IBM have some fairly complex ideas for
>separator/terminator/initiator/escape that we will have to try to
>seat in this framework.
>
>Thanks,
>
>Martin
>
>
>----------
>From: Jim Myers [mailto:jimmyers at ncsa.uiuc.edu]
>Sent: Wednesday, March 01, 2006 3:49 AM
>To: Westhead, Martin (Martin); dfdl-wg at ggf.org
>Subject: Re: [dfdl-wg] CSV string worked example
>
>Martin - two types of comments - things I think are
>typos/inconsistencies and an alternate logic:
>
>Clarifications:
>are the initial definitions on the top element defining an order to
>use subsequently or are they just there for us to see what you've defined?
>Of the four there, you only explicitly (in a comment?) invoke one -
>are the others implicit because of the order?
>You use dfdl:tokenizer as a conversion later - is that supposed to
>be split as well?
>bytetochar is used implicitly before the first split?
>chartostring is used implicitly before stringtoint which is
>implicitly used to get the int element?
>is EOS a returned value (and therefore of the type being returned)
>or is it an exception?
>
>Logical - what happens if the rows are not in the logical model -
>physically there are 10 rows with 5 elements, but the logical model
>is 50 ints in a single sequence. To support this, you'd need to have
>both tokenization steps in one sequence annotation with two separate
>split separators - does the use of setLocal for split separator work
>in this case? (Is this how byteorder is now used?)
>Thinking about missing values - is it clear how a missing row versus
>a missing element is now handled (I think so) - the split conversion
>using comma can define a default input to use if the stream it
>recieves is empty (from a \n\n pair) and the stringtoint conversion
>can do likewise to cover a ,, pair.
>
> Jim
>
>
>At 09:25 PM 2/28/2006, Westhead, Martin (Martin) wrote:
>
>Hi Folks,
>
>I have tried to work through the CSV example that Mike suggested a
>couple of weeks ago. It has turned up some interesting issues which
>I have tried to address. These are less about making the underlying
>semantics work and more about providing a seamless default set up
>that makes the easy things work just as you would like.
>
>I was pushed for time on this so I apologies if this is unclear in
>places, but I wanted to put it out before tomorrow's meeting.
>
>Thanks,
>
>Martin
>
>James D. Myers
>Associate Director, Cyberenvironments and Technologies, NCSA
>1205 W. Clark St, MC-257
>Urbana, IL 61801
>217-244-1934
>jimmyers at ncsa.uiuc.edu
James D. Myers
Associate Director, Cyberenvironments and Technologies, NCSA
1205 W. Clark St, MC-257
Urbana, IL 61801
217-244-1934
jimmyers at ncsa.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20060301/7ac180fd/attachment.htm
More information about the dfdl-wg
mailing list