Details
Description
Tika PDF parser emits a NullPointerException on the attached PDF file "TEST_THOR.PDF". The file displays as expected in Acrobat.
The call stack goes:
java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.interactive.form.PDFieldFactory.findFieldType(PDFieldFactory.java:113)
at org.apache.pdfbox.pdmodel.interactive.form.PDFieldFactory.createField(PDFieldFactory.java:48)
at org.apache.pdfbox.pdmodel.interactive.form.PDField.fromDictionary(PDField.java:77)
at org.apache.pdfbox.pdmodel.interactive.form.PDNonTerminalField.getChildren(PDNonTerminalField.java:136)
at org.apache.tika.parser.pdf.PDF2XHTML.processAcroField(PDF2XHTML.java:698)
at org.apache.tika.parser.pdf.PDF2XHTML.extractAcroForm(PDF2XHTML.java:680)
at org.apache.tika.parser.pdf.PDF2XHTML.endDocument(PDF2XHTML.java:243)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:267)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:160)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:144)
at gov.nih.niaid.fscanner.Extract.ExtractContents(Extract.java:62)
at gov.nih.niaid.temp.Main.main(Main.java:69)
Attachments
Attachments
Issue Links
- depends upon
-
PDFBOX-3534 NPE if an AcroForm field's child cosdict is null
- Closed
-
TIKA-2209 Update PDFBox to 2.0.4
- Closed
- is duplicated by
-
TIKA-2139 NullPointerException on a valid PDF after loading
- Resolved