[DFDL-WG] Possible issues for xs:string when dfdl:lengthUnits is 'bytes' and non-SBCS encodings

Steve Hanson smh at uk.ibm.com
Thu Feb 14 04:55:51 EST 2013


1) For xs:string if dfdl:lengthKind is 'implicit' then xs:maxLength is 
used to extract N units from the data. If dfdl:lengthUnits is 'bytes' then 
N bytes are extracted. If validation is switched on xs:maxLength is also 
used to validate that no more than N characters appear in the infoset. 
This seems problematic where the dfdl:encoding is non-SBCS. 

2) For xs:string if dfdl:lengthKind  implies a variable length on output 
and dfdl:textPadKind is not 'none' then xs:minLength is used to ensure 
that at least  N units are output. If  dfdl:lengthUnits is 'bytes' then N 
bytes are written to the data. If validation is switched on xs:minLength 
is also used to validate that at least N characters appear in the infoset. 
Again this seems problematic where the dfdl:encoding is non-SBCS. 

Should we disallow the combinations that actually cause a problem?

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130214/c8445c4e/attachment.html>


More information about the dfdl-wg mailing list