Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.8.6
-
None
Description
Looks like a character mapping issue crept in some time between 1.8.5 and 1.8.6 on this file?
With both seq and NonSeq parsers, the correct text was extracted via ExtractText in 1.8.5. In 1.8.6, java -jar pdfbox-app-1.8.6.jar ExtractText yields text starting with:
7>PFLK>I 9>NH ;BNRF@B =%;% .BM>NPJBKP LC PEB 3KPBNFLN 9>@FCF@ -L>OP ;@FBK@B >KA 5B>NKFKD -BKPBN :BOB>N@E 9NLGB@P ;QJJ>NT .B@BJ?BN (&&* "&++&,-+Æ$( #&+-&%+$-& !).&)-*+Æ&,
Attachments
Issue Links
- is broken by
-
PDFBOX-2058 The text of pdfs using Type1C can't be extracted correct
- Closed
- is related to
-
PDFBOX-2377 Apparent regression in character mapping in a few files from govdocs1
- Closed