Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.6
-
None
Description
I got a ClassCastException throught Tika HTML extraction on PDFBox code:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@60b1fa63
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 16 more
Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSStream cannot be cast to org.apache.pdfbox.cos.COSString
at org.apache.pdfbox.cos.COSDictionary.getDate(COSDictionary.java:787)
at org.apache.pdfbox.pdmodel.PDDocumentInformation.getCreationDate(PDDocumentInformation.java:212)
at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:256)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:146)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)