[MASSEMBLY-371] Converting line endings corrupts ISO-8859-1 files when platform encoding is UTF-8 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2-beta-2, 2.2
Fix Version/s: 2.4
Component/s: None
Labels:
None
Environment:
Linux with platform encoding set to UTF-8

Description

Converting line endings for a text file encoded in ISO-8859-1 replaces any character in the set above ASCII with the three characters ï¿.
What happens is that the file to be converted is read as text in the platform encoding (seems to be method readFile in class FileFormatter), and when the platform encoding is UTF-8, any non-ASCII character from ISO-8859-1 is converted to the UTF-8 character "�" (i.e. the placeholder for unknown / broken character).

I've attached a small sample project that shows this problem on Linux with platform encoding set to UTF-8.

I see two possible fixes for this, one is to read the file as bytes and do a search /replace for line endings, and the other is to be able to specify encoding for a fileset or file.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

assembly-encoding.zip
28/Nov/08 03:24
2 kB
Håvard Wigtil

Activity

People

Assignee:: Dennis Lundberg

Reporter:: Håvard Wigtil

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Nov/08 03:24

Updated:: 08/Apr/15 08:02

Resolved:: 28/Oct/12 06:54