Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.22, 3.0.0 PDFBox
-
java8
Description
We are developing a software that validates integrity of PDF signatures using PDFBox.
We have faced a performance problem in the PDF validation, we have found the class that is causing the problem, and have made an improved version of it.
We are doing the validation in the same way as it is done hereĀ https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/sign/ValidateSignature.java , method "validateSignaturesImproved". This method uses PDSignature.getContents() and PDSignature.getSignedContent()
When validating big PDF files of more than 3MB we realized the performance in validation was very high.
In the end we found that, org.apache.pdfbox.pdmodel.interactive.digitalsignature.COSFilterInputStream was reading the document byte-by-byte, checking ranges every byte.
We have rewritten COSFilterInputStream to work with byte blocks and the validation time has dropped a lot.
We have tested this in PDFBox 2.0.22 and 3.0.0-SNAPSHOT. We have attached the test project (TestPDFBox.zip). Here is the code that reproduces the problem:
try(PDDocument doc = Loader.loadPDF(new File(args[0]))){ PDSignature signature = doc.getLastSignatureDictionary(); byte[] signedContent = signature.getSignedContent(new FileInputStream(args[0])); byte[] signatureBytes = signature.getContents(); }
Without our modification, with a 3MB signed PDF, it takes 10 seconds to do this. With our modification, it takes 0.2 seconds.
We would like to have this improved code reviewed in pdfbox.