Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.3.1, 1.4.0
-
None
Description
When extracting text in a PDF document in which word spacing is achieved by matrix translation, in versions 1.3.x and 1.4 the different words are being merged.
This situation doesn't happen in 1.2 branch. After investigating a bit, the error was introduced with a refactoring of the PDFStreamEngine class, and is related to textMatrixEnd computation. In 1.2 branch the characterSpacingWidth was added after computing the textMatrixEnd, but in 1.3 (and 1.4) this characterSpacingWidth is preadded to the textMatrixEnd, so the system is unable to detect a new word.