Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.2.3
-
None
-
None
-
Linux
Description
In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not:
<?xml version="1.0" encoding^@^@^@^@^@="UTF-8" ?>
<root_elem>
<child_elem some_attr="abc" />
<child_elem some_attr="def" />
</root_elem>
The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error:
<?xml version="1.0" encoding="UTF-8" ?>
<root_elem^@^@^@^@^@>
<child_elem some_attr="abc" />
<child_elem some_attr="def" />
</root_elem>
This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing.