Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.3
-
None
Description
Creating changelog.xml file doubles UTF-8 encoding if the git comment information is already UTF-8 format. For example: if property outputEncoding is set to ISO-8859-1 the output is (shown as od dump):
0004060 7375 7420 696f 696d 616d 6e61 6d20 c379 u s t o i m i m a a n m y ├ 0004100 73b6 6c20 7369 a4c3 6b79 6573 7373 a4c3 Â s l i s ├ ñ y k s e s s ├ ñ
And when set to UTF-8 the output is:
0004060 6d69 6d69 6161 206e 796d 83c3 b6c2 2073 i m i m a a n m y ├ â ┬ Â s
The result of UTF-8 encoding is that scandinavian umlauts are garbled. Code C3 B6 is the right for the "ö"-letter.
The ISO-8859-1 format would do for the site documentation but since the file changelog.xml header says ISO-8859-1 encoding, rest of the process fails to process umlauts.
I modified class ChangeLogReport method writeChangelogXml() by commenting out issue MCHANGELOG-86 writer change:
PrintWriter pw = new PrintWriter(new BufferedOutputStream(new FileOutputStream(outputXML))); pw.write(changelogXml.toString()); pw.flush(); pw.close(); // MCHANGELOG-86 // Writer writer = WriterFactory.newWriter( new BufferedOutputStream( new FileOutputStream( outputXML ) ), // getOutputEncoding() ); // writer.write(changelogXml.toString()); // writer.flush(); // writer.close();
It might be there is double escaping in Writer since couple of lines above the change set is created with encoding information:
String changeset = changelogSet.toXML(getOutputEncoding());
However, this is just a wild guess since I did not check out implementation of changelogSet.toXML() or writer.write(). It could be also something different in version control access since MCHANGELOG-86 was a SVN issue and here we got with GIT.