[DFDL-WG] Possible issues for xs:string when dfdl:lengthUnits is 'bytes' and non-SBCS encodings
Steve Hanson
smh at uk.ibm.com
Thu Feb 14 04:55:51 EST 2013
1) For xs:string if dfdl:lengthKind is 'implicit' then xs:maxLength is
used to extract N units from the data. If dfdl:lengthUnits is 'bytes' then
N bytes are extracted. If validation is switched on xs:maxLength is also
used to validate that no more than N characters appear in the infoset.
This seems problematic where the dfdl:encoding is non-SBCS.
2) For xs:string if dfdl:lengthKind implies a variable length on output
and dfdl:textPadKind is not 'none' then xs:minLength is used to ensure
that at least N units are output. If dfdl:lengthUnits is 'bytes' then N
bytes are written to the data. If validation is switched on xs:minLength
is also used to validate that at least N characters appear in the infoset.
Again this seems problematic where the dfdl:encoding is non-SBCS.
Should we disallow the combinations that actually cause a problem?
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130214/c8445c4e/attachment.html>
More information about the dfdl-wg
mailing list