Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-617

Crash parsing pdf file (http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf) from Tika

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 0.8.0-incubator
    • None
    • Parsing
    • Linux debian: Linux 2.6.18-6-686 #1 SMP i686 GNU/Linux
      java version "1.6.0_13"
      Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
      Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing)

    Description

      Parsing the file http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf the call to Tika "parse" fails with the followinf stack trace:

      java.io.IOException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@1578aab
      at com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.parse(GenericDocumentParserTikaImpl.java:143)
      at com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.main(GenericDocumentParserTikaImpl.java:306)
      Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@1578aab
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:126)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
      at com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.parse(GenericDocumentParserTikaImpl.java:69)
      ... 1 more
      Caused by: org.apache.pdfbox.exceptions.WrappedIOException
      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:841)
      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:808)
      at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:53)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
      ... 4 more
      Caused by: java.util.NoSuchElementException
      at java.util.AbstractList$Itr.next(AbstractList.java:350)
      at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
      at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
      ... 8 more

      Attachments

        1. Portogallo2010.pdf
          3.01 MB
          Stefano Falconetti
        2. StatiUniti2010_1.pdf
          2.95 MB
          Stefano Falconetti
        3. Irlanda26-52pag.pdf
          1.28 MB
          Stefano Falconetti
        4. Irlanda125pag.pdf
          1.41 MB
          Stefano Falconetti

        Activity

          People

            Unassigned Unassigned
            stfalcon Stefano Falconetti
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: