Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.15
-
None
Description
Hello, if i try get text from pdf (attached), i will result empty out and many warns. Font attached also.
Acrobat reader will open succeed, I can select, copy text and save as text
my code:
private static void parseOne(String path) throws IOException { String pdfFileInText; PDFTextStripper tStripper; File file = new File(path); tStripper = new PDFTextStripper(); MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0, 500000000).setTempDir(new File("/home/user/pdfBoxTest/newFiles/")); PDDocument document = PDDocument.load(file, memUsageSetting); if (!document.isEncrypted()) { pdfFileInText = tStripper.getText(document); System.out.print(pdfFileInText); } document.close(); }
Error:
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init> WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init> WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book