Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
ManifoldCF 2.4
-
None
-
windows 64bit, java version "1.8.0_77", pdfbox-1.8.10.jar, tika-parsers-1.10.jar
Description
The Tika extractor gets stuck (is trying to parse the same document again and again) on the following error:
FATAL 2016-04-29 10:55:45,505 (Worker thread '41') - Error tossed: null java.lang.StackOverflowError at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:296) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:348) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:319)
-Xss - is the default one, which is, I believe, 512k.
We can increase the stack trace size, but I think, this error should not lead to such situation.
Thanks a lot!
Attachments
Issue Links
- depends upon
-
CONNECTORS-1308 Upgrade to Tika 1.12
- Resolved