Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
2.0.0
-
None
Description
Attached is a page of a file that was parsed fine with PDFBox 1.8
In 2.0, using pdfbox/examples/util/PrintTextLocations.java
lots of the text is missing - for example all the text like
"MERCH BANKCARD NET SETLMT"
Also it has width_of_space as some bad value - 561591.3
Start of PrintTextLocations....
Oct 21, 2015 10:36:22 PM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Oct 21, 2015 10:36:22 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
WARNING: java.util.zip.DataFormatException: incorrect data check
Oct 21, 2015 10:36:22 PM org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
WARNING: Cannot execute restore, the graphics stack is empty
String[161.94,422.1 fs=10.0 xscale=10.0 height=7.2857146 space=561591.3 width=6.6857147]B
String[168.62572,422.1 fs=10.0 xscale=10.0 height=7.2857146 space=561591.3 width=4.457138]e
String[173.08286,422.1 fs=10.0 xscale=10.0 height=7.2857146 space=561591.3 width=4.9714355]g
String[178.05429,422.1 fs=10.0 xscale=10.0 height=7.2857146 space=561591.3 width=2.742859]i
Attachments
Attachments
Issue Links
- is duplicated by
-
PDFBOX-2508 Text extraction getting zero font height, bad widths, and ? for text in this PDF with Type 3 Fonts
- Closed
-
PDFBOX-2976 java.util.zip.DataFormatException: incorrect data check
- Closed