Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2158

XMLUTF8Transcoder: One multibyte UTF8 character is swallowed from the srcData when the resulting surrogate pair does not fit in toFill at the end

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.2.0, 3.1.4, 3.2.1, 3.2.2
    • 3.2.3
    • Utilities
    • None
    • OS independent: Linux (RedHat 7.5)/Windows 10

      Compiler independent
    • Patch

    Description

      Bug found in Xerces-C++ Version 3.1.4 (based on code reviews also newer versions are affected)

       

      How to reproduce: Call SAX2Print for the attached UTF8.xml file "SAX2Print UTF8.xml".
      One chinese character is missing in the name attribute of the last but one Instance element.

      Fix: The fix for this bug is included in the xerces.patch file.
      In XMLUTF8Transcoder.cpp a check for this issue was already included but the conclusion
      that the bytes read are updated at the end of the loop was wrong.
      The bytes read (bytesEaten) calculation is based on the srcPtr which was already updated when the check is made.
      Therefore srcPtr needs to be repositioned in case the Surrogate pair does not fit into the toFill buffer.

       

      Contributor related:

      Author Name of the code being contributed: Johannes Willnecker

      Employer: Siemens AG

      I have the right to grant the copyright licenses for the contribution.

      My employer has rights to the code that I have written. My employer gave me permission to contribute this code on its behalf.

      I am not aware of any third-party license or other restrictions.

      Attachments

        1. xerces.patch
          0.9 kB
          Johannes Willnecker
        2. UTF8.xml
          23 kB
          Johannes Willnecker

        Activity

          People

            scantor Scott Cantor
            willnecker Johannes Willnecker
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: