[PDFBOX-1653] Fix pdfbox eating up big chunks of memory for identical CID mappings - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.8.1, 1.8.2, 2.0.0
Fix Version/s: 1.8.3, 2.0.0
Component/s: FontBox
Labels:
- PatchAvailable

Description

pdfbox currently handles the PDF beginbfrange command (which creates a character mapping for a range of CIDs to Unicode characters) in a very inefficient way.

If a PDF document contains a range of CID 0 to CID 65535 with a mapping offset of 0 (which translates to "CID values map 1:1 to Unicode characters", pdfbox would nevertheless map each and every CID.

There apparently are PDFs with a lot of these 0-65535 mappings, and such a single PDF may cause an OutOfMemoryError.

This patch detects zero-offset ranges and basically just skips them from an explicit mapping.
There is some special handling for the space character included in the patch, which might or might not be relevant.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PDFBOX-1653.patch
27/Jun/13 22:52
2 kB
Christian Kohlschütter

Issue Links

is depended upon by

PDFBOX-1692 java.lang.OutOfMemoryError: Java heap space

Closed

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Christian Kohlschütter

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Jun/13 22:49

Updated:: 30/Nov/13 17:02

Resolved:: 31/Aug/13 12:35