[dfdl-wg] CSV string worked example

Jim Myers jimmyers at ncsa.uiuc.edu
Wed Mar 1 09:06:27 CST 2006


>
>EOS is up for grabs I was thinking of it as a returned value (e.g. 
>-1) but an exception might (or might not) be easier to make sense of.

Assuming -1 is a valid value and not used as a real value, etc. - I 
think we need a separate mechanism to indicate the end

>
>Regarding the new model. I don't think this is a problem at the 
>level of your example. We could simply use a single sequence and a 
>more complex "split" conversion. I imagine that the "split" 
>conversion we would want to settle on should accept a regular 
>expression (or at least a list of separators). In your example you 
>just have to allow the separator to be a new line OR a comma and you are done.

Yes but - I want to reuse the simple conversions and therefore keep 
the ability to deal with all the variations you note below without 
special constructs - if I consider this a sequence of two steps I can 
deal with missing separators and terminators differently, if I put 
the two together, I have to account for all the possible variations. 
And I want to handle a data cube the same way without having to wait 
for someone to build a new converter...(we talked this through 
generally before - if you have a way to create new converters from 
existing ones, you could support both ways).

>
>A note here this is intended as a rough sketch not a finished 
>design. I am expecting the details to need to be worked out here. In 
>particular I think Mike/IBM have some fairly complex ideas for 
>separator/terminator/initiator/escape that we will have to try to 
>seat in this framework.
>
>Thanks,
>
>Martin
>
>
>----------
>From: Jim Myers [mailto:jimmyers at ncsa.uiuc.edu]
>Sent: Wednesday, March 01, 2006 3:49 AM
>To: Westhead, Martin (Martin); dfdl-wg at ggf.org
>Subject: Re: [dfdl-wg] CSV string worked example
>
>Martin - two types of comments - things I think are 
>typos/inconsistencies and an alternate logic:
>
>Clarifications:
>are the initial definitions on the top element defining an order to 
>use subsequently or are they just there for us to see what you've defined?
>Of the four there, you only explicitly (in a comment?) invoke one - 
>are the others implicit because of the order?
>You use dfdl:tokenizer as a conversion later - is that supposed to 
>be split as well?
>bytetochar is used implicitly before the first split?
>chartostring is used implicitly before stringtoint which is 
>implicitly used to get the int element?
>is EOS a returned value (and therefore of the type being returned) 
>or is it an exception?
>
>Logical - what happens if the rows are not in the logical model - 
>physically there are 10 rows with 5 elements, but the logical model 
>is 50 ints in a single sequence. To support this, you'd need to have 
>both tokenization steps in one sequence annotation with two separate 
>split separators - does the use of setLocal for split separator work 
>in this case? (Is this how byteorder is now used?)
>Thinking about missing values - is it clear how a missing row versus 
>a missing element is now handled (I think so) - the split conversion 
>using comma can define a default input to use if the stream it 
>recieves is empty (from a \n\n pair) and the stringtoint conversion 
>can do likewise to cover a ,, pair.
>
>   Jim
>
>
>At 09:25 PM 2/28/2006, Westhead, Martin (Martin) wrote:
>
>Hi Folks,
>
>I have tried to work through the CSV example that Mike suggested a 
>couple of weeks ago. It has turned up some interesting issues which 
>I have tried to address. These are less about making the underlying 
>semantics work and more about providing a seamless default set up 
>that makes the easy things work just as you would like.
>
>I was pushed for time on this so I apologies if this is unclear in 
>places, but I wanted to put it out before tomorrow's meeting.
>
>Thanks,
>
>Martin
>
>James D. Myers
>Associate Director, Cyberenvironments and Technologies, NCSA
>1205 W. Clark St, MC-257
>Urbana, IL 61801
>217-244-1934
>jimmyers at ncsa.uiuc.edu

James D. Myers
Associate Director, Cyberenvironments and Technologies, NCSA
1205 W. Clark St, MC-257
Urbana, IL 61801
217-244-1934
jimmyers at ncsa.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20060301/7ac180fd/attachment.htm 


More information about the dfdl-wg mailing list