Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4322

Extract Text feature is not working for some part of PDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.2, 2.0.11
    • 2.0.12, 3.0.0 PDFBox
    • Text extraction
    • None

    Description

      Text Extraction feature cannot extract text from attached pdf properly.

       

      Text inside of rectangle box (e.g value of Lending Specialist and others) is not getting extracted.

      Attachments

        1. PDFBOX-4322-Q3FOMIEI6S2BMGSRZUNRBP2OZQ4BPSKY.pdf
          80 kB
          Tilman Hausherr
        2. PDFBOX-4322-Empty-ToUnicode-reduced.pdf
          21 kB
          Tilman Hausherr
        3. pdf__1.pdf.xml
          4 kB
          Tim Allison
        4. pdf__1.pdf
          524 kB
          Amit Maheshwari

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              aa.amit.mahheshwari Amit Maheshwari
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: