Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-2128

XML preamble encoding ignored when CLI unparsing with "xml" infoset type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • CLI
    • None

    Description

      When using the CLI to unparse XML using the "xml" infoset type, we have the following code:

      case "xml" => {
        val rdr = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(anyRef.asInstanceOf[Array[Byte]])))
        new XMLTextInfosetInputter(rdr)
      }
      

      In order to create the XMLTextInfosetInputter, we create an InputStreamReader, but we do not specify an encoding. This means the Java "file.encoding" system property will be used to decode this XML. So on machines where that property isn't UTF-8 (e.g. Windows), this can result in UTF-8 data in the XML not decoded correctly, which leads to incorrect unparsed data.

      I believe Woodstox has the ability to inspect XML and determine the encoding based on the preamble, so we should just take advantage of that. So we should change the XMLTextInfosetInputter to accept an InputStream in the constructor instead of a Reader, and deprecate the Reader constructor.

      Attachments

        Activity

          People

            slawrence Steve Lawrence
            slawrence Steve Lawrence
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: