Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3799

Problem in TextPosition's hashCode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.6
    • 2.0.7, 3.0.0 PDFBox
    • Text extraction
    • None

    Description

      Just another side effect related to TextPosition's hashCode

      I am using the hashCode because I want to know the color of each letter. To do this, during the processTextPosition, I save the current graphic state in a map, using the current text position as key. Then, on writeString, I iterate all the text positions and I get the color for each of them though this map.

      Of course would be easier if this information could be saved in the text position. But this is just a desired feature.

      I am discovering that from processTextPosition to writeString sometimes happens that the same textPosition has just a different unicode. In processTextPosition is just a "x" (char 120), but then on writeString the same textPosition the unicode is the x, followed by '̄' (char 772). Everything about the textPosition remains the same: same coordinates, same System.identityHashCode; the only thing that changes is the unicode, which causes the computation of a different hashCode.

      That is giving problem. As workaround I am using now System.identityHashCode instead of the current TextPosition's implementation

      Attachments

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              miromannino Miro Mannino
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: