Description
The dfdl:textNumberPattern property is an ICU number format pattern.
It has a positive part and optional negative part separated by a ";"
ICU documents that the negative part is used only to define the negative sign indication. So for example "#;-#" means a hyphen is used as a minus sign. The pattern "#;(#)" means negative values are surrounded by parentheses. Other pattern characters can be present, but are ignored except for indicating where these sign indicator characters are relative to the digits of the number.
So for example "00000;- #" means negative 12 would be formatted as - 00012 because the number of digits is taken from the positive pattern.
Only the "- " (hypen and space) is taken from the negative pattern. This pattern means the same:
"0000;- ######00000000"
Since the only significance to the negative pattern "######00000000" string is to indicate that the hyphen and space appears before the digits. In fact any number specifier like "##,###,##0.00###" in a negative pattern is ignored and really should just be written as a single "0" or "#" character. Other things like the ICU pad character specifier, if they appear in the negative pattern, are ignored as well regardless of the fact that they could be useful.
The fact that these are allowed, yet ignored, is unintuitive, misleading, and error prone, because users are not going to realize almost everything about the negative pattern gets ignored.
Daffodil should warn if the negative pattern contains anything other than prefix, a single "#" or "0" character, and suffix specified.
The warning message should say the negative pattern is only used to specify the prefix and suffix used to indicate negative values. Everything else is ignored. Ideally we should parse the negative pattern syntax and point out all the ignored parts.
This warning should be suppressable via the usual WarnID mechanism.
We should consider having a tunable or property which if set escalates this warning to a SchemaDefinitionError.
Honestly I think the only meaningful negative patterns are probably:
- "-#"
- "(#)"
- "#-"
With minor variations which insert spaces such as:
- "- #" (a space after the sign)
- "( # )" (spaces between digits and parens
- "# -" (a space before the trailing sign)
Here's an example of a complex dfdl:textNumberPattern which makes the point that the negative pattern is just a kind of trivial tail end.
dfdl:textNumberPattern="+ *x#, ###,##0.00;- #"
The only contribution of the negative part of that pattern is that "- " (hyphen and space) is used as the prefix for negative values. The rest all comes from the positive pattern.
The value negative 1234.5 would unparse as "- xxxx1,234.50"
If instead the user writes:
dfdl:textNumberPattern="+ *x#, ###,##0.00;- *x#,###,##0.00"
The warning should be issued and state that the negative part of this pattern is mostly ignored. Only the "- " is significant, and the negative part could be just "- #", so the whole pattern shortened to "+ *x#, ###,##0.00;- #".
This requires a simple parse of the negative pattern to identify the significant parts, but this is quite easy. (Lookup Scala Regex Pattern Matching).