Description
We're using TIKA embedded in a webcrawler and today I've encountered a PDF that results in OutOfMemory errors while being processed by TIKA.
It's a small, 1 page PDF file, so I don't think that it should consume that much memory.
I verified the problem by using the GUI from the tika-app-1.13.jar file and that results in the same error on the same file. The file can be found at:
http://www.spesmea.nl/pdf/algemene_voorwaarden_bbztcn_2010_nl.pdf
If I can help by providing any additional information, please let me know.
Attachments
Issue Links
- depends upon
-
PDFBOX-3442 OOM for single page pdf file
- Closed
-
TIKA-2051 Upgrade to PDFBox 2.0.3 when available
- Closed
- is cloned by
-
TIKA-2496 TIKA crashes / runs out of memory on simple PDF
- Resolved