[DFDL-WG] how to trim inside of escape block?
Mike Beckerle
mbeckerle.dfdl at gmail.com
Wed Nov 22 12:40:13 EST 2017
Another related problem:
a, b, notList, c, d
a, b, "list1, list2, list3",c,d
Here the 3rd field is a list, comma separated. Quoted if there is more than
one list item.
I think to parse this I have to treat the quotation marks as
initiator/terminator, and set dfdl:separator="", but since the quotes are
optional for the single-list-item case, I'm going to need a choice.
I think the best I can do is
<ignore:ListOf1__XMLSchemaMakesMeHaveThisForUPA/>
<List>notList</List>
and
<ignore:ListOfN__XMLSchemaMakesMeHaveThisForUPA/>
<List>list1</List><List>list2</List><List>list3</List>
as the XML representations.
Are there any better/cleaner solutions?
I did think of this way: (note: I've omitted xs:annotation and xs:appinfo
for brevity), but it isn't exactly "clean".
This is what I call "modeling syntax as data"....
<dfdl:defineVariable name="foundOpenQuote" type="xs:boolean"/>
<xs:group name="optionalOpenQuote">
<choice>
<xs:sequence dfdl:initiiator='"'>
<dfdl:setVariable ref="foundOpenQuote" value="{ fn:true() }"/>
</xs:sequence>
<xs:sequence dfdl:initiator=""/>
</choice>
</xs:group>
<xs:group name="matchingCloseQuote">
<choice>
<xs:sequence dfdl:terminator='"'>
<dfdl:discriminator>{ $foundOpenQuote eq fn:true() }</dfdl:assert>
</xs:sequence>
<xs:sequence />
</choice>
</xs:group>
// The main sequence for the data would then have this as the list element:
<xs:sequence>
<dfdl:newVariableInstance ref="foundOpenQuote" defaultValue="false"/>
<xs:sequence dfdl:hiddenGroupRef="optionalOpenQuote"/>
<xs:sequence dfdl:separator=",">
<xs:element name="List" type="xs:string" maxOccurs="unbounded"/>
</xs:sequence>
<xs:sequence dfdl:hiddenGroupRef="matchingCloseQuote"/>
</xs:sequence>
I'd try this out, except that we haven't got dfdl:newVariableInstance yet.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
On Wed, Nov 22, 2017 at 4:11 AM, Steve Hanson <smh at uk.ibm.com> wrote:
> I don't think there is a way to achieve what you want. As you say,
> trimming pad chars takes precedence over applying escape scheme.
>
> I wondered if you could define the escapeBlockStart and End as "%WSP*;
> and %WSP*;" respectively but the white space entities are not allowed as
> escape character or in escape block start/end.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848 <+44%201962%20815848>
> mob:+44-7717-378890 <+44%207717%20378890>
>
>
>
> From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Date: 22/11/2017 01:28
> Subject: [DFDL-WG] how to trim inside of escape block?
> Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
>
> I have a CSV file
>
> Some lines look like this
>
> a,b," started with spaces, appearing right after the escape block
> start ",c,d,e
>
> I reviewed the spec, and I see that pad characters appear outside of the
> quotation marks (escape block start/end).
>
> What I'm trying to do is remove the whitespace after the escape block
> start, and before the escape block end. This is just spurious whitespace,
> appears because some of these CSV files were edited by people.
>
> In my data the quoting characters are not always present. They are only
> there if a comma appears in the data string.
>
> Is there a technique for getting rid of the leading/trailing whitespace
> inside the escape block start/end that I have forgotten?
>
> ...mikeb
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.tresys.com&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=vfzt-MyHajT591zYQmbcxckPT-mZLjNRPlTrg8kgRgY&s=vDa_CXvz_6ZAge5Ddy0xcukdYO5ZecWcijrrwh8LCAI&e=>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ogf.org_About_abt-5Fpolicies.php&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=vfzt-MyHajT591zYQmbcxckPT-mZLjNRPlTrg8kgRgY&s=KPFq-Tn_5Fmdo1dbD6fIVEGz348_1uFxuTKdJxqZnqM&e=>
> --
> dfdl-wg mailing list
> dfdl-wg at ogf.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ogf.org_mailman_
> listinfo_dfdl-2Dwg&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=
> AJa9ThEymJXYnOqu84mJuw&m=vfzt-MyHajT591zYQmbcxckPT-
> mZLjNRPlTrg8kgRgY&s=6PDI_r_U7OUsqAxzv24ZiCuH5zPpWFyzXbneqH1GPXk&e=
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20171122/bbb1a793/attachment.html>
More information about the dfdl-wg
mailing list