Description
This PDF
http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
has an exception because the end of an inline image is improperly detected. The stream looks like this:
BI /W 452 /H 169 /BPC 8 /CS /RGB /D [0.0 1.0 0.0 1.0 0.0 1.0] /F [/A85 /Fl] ID ...................................................... ....................................................EI ...................................................... ... .... EI Q
The inline images are handled in PDFStreamParser. This is tricky, we look for followup bin data to check that it isn't an EI in the middle, but here it isn't bin data, but ascii85 stuff. We also can't request that there be a LF before the EI, because I remember that I had a PDF at work created by a well known company that doesn't use it.
Attachments
Attachments
Issue Links
- duplicates
-
PDFBOX-2493 OOM with corrupt PDF file
- Closed
- is related to
-
PDFBOX-2385 inline image with EI at the end incorrectly parsed
- Closed
-
TIKA-1300 Switch default PDFBox parser to NonSequentialParser
- Resolved