Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2240

Junk characters (including null) allowed in XML declaration

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.2.3
    • None
    • Non-Validating Parser
    • None
    • Linux

    Description

      In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not:

      <?xml version="1.0" encoding^@^@^@^@^@="UTF-8" ?>
      <root_elem>
        <child_elem some_attr="abc" />
        <child_elem some_attr="def" />
      </root_elem>

      The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error:

      <?xml version="1.0" encoding="UTF-8" ?>
      <root_elem^@^@^@^@^@>
        <child_elem some_attr="abc" />
        <child_elem some_attr="def" />
      </root_elem>

      This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing.

      Attachments

        1. basic_bad_bytes.xml
          0.1 kB
          Benjamin Fritz
        2. basic_bad_bytes2.xml
          0.1 kB
          Benjamin Fritz

        Activity

          People

            Unassigned Unassigned
            ColBFritz Benjamin Fritz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: