Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.0.0, 1.27
-
None
-
None
-
Windows 10, Liberica OpenJDK FULL x64 1.8.0_302
Description
Some Korean chars are extracted as squares. The encodings of plain texts are detected correctly. Maybe this is related with the content handler (just a guess). I'll attach the triggering files.
Attachments
Attachments
Issue Links
- is related to
-
TIKA-324 Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
- Resolved