Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
-
None
Description
I was told to report a bug here. There are problems with extracting text from PDF files in Dutch. The bug was reported in issue TIKA-1095 (https://issues.apache.org/jira/browse/TIKA-1095). The problem can be reproduced with the latest Tika version.
The extracted Text only shows gibberish (or in other cases question marks and incorrect characters).
It was suggested it could be a font problem. Could this be looked into?
Attachments
Attachments
Issue Links
- relates to
-
TIKA-1095 Only gibberish extracted from this PDF
- Closed