Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2749

Annotations character bounding boxes size 3 times higher than expected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Cannot Reproduce
    • 1.8.4
    • None
    • Text extraction
    • None

    Description

      After text extraction the character bounding boxes 3 times higher than expected. For example, see the first few character bounding boxes below:
      [90.1,46,6.64,40.06],[96.7,46,5.09,40.06],[101.79,46,5.8,40.06].
      The values are x, y, width, height. The width of the characters are between 5 and 7 pixels, but the height of the characters are 40.6 pixels. The actual height of each line of text appears to be about 12 pixels. The example pdf document attached.

      Attachments

        1. RESULT.pdf
          155 kB
          Hayk Hayryan

        Activity

          People

            Unassigned Unassigned
            hhayryan Hayk Hayryan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: