Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3513

PDFType3Font can have Encoding which is not a COSDictionary

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.0.3
    • None
    • PDModel
    • None

    Description

      I tried the "ExtractTextByArea" with a PDF from my bank and go the following stacktrace:

      Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSName cannot be cast to org.apache.pdfbox.cos.COSDictionary
      	at org.apache.pdfbox.pdmodel.font.PDType3Font.readEncoding(PDType3Font.java:70)
      	at org.apache.pdfbox.pdmodel.font.PDType3Font.<init>(PDType3Font.java:57)
      	at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:79)
      	at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
      	at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
      	at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815)
      	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472)
      	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:446)
      	at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
      	at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
      	at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
      	at org.apache.pdfbox.text.PDFTextStripperByArea.extractRegions(PDFTextStripperByArea.java:130)
      

      I can read the PDF with evince, pdftk, lesspipe to extract text. I cannot upload the file as it is a bank statement with content I will not share.

      Debugging the application I found that the return from PDType3Font.java:69

      dict.getDictionaryObject(COSName.ENCODING)
      

      Is an object of type: "COSName" with the name property set to "WinAnsiEncoding".

      I hope this helps - despite the fact that I cannot share the example PDF.

      Attachments

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              emanuelgreisen Emanuel Greisen
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: