Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.8.2
-
None
-
Windows 7
java version 1.7.0_17 (build 1.7.0_17-b02/64-Bit Server VM build 23.7-01)
pdfbox-app-1.8.2.jar
Description
Hello,
I have a problem with text extraction.
The problem is not enough memory in VM during the text extraction!
My Code:
String pdfFile = "D:\testfolder\test1fd9a_test.pdf"; //size of file 168 KB
PDDocument document = PDDocument.load(pdfFile, true);
PDFTextStripper stripper = null;
try {
stripper = new PDFTextStripper();
stripper.setSortByPosition(true);
stripper.writeText(document, outputWriter);
} catch () {
}
You get an error:
java.lang.OutOfMemoryError: Java heap space
Attachments
Attachments
Issue Links
- depends upon
-
PDFBOX-1653 Fix pdfbox eating up big chunks of memory for identical CID mappings
- Closed