Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3779

Temp file leftover in PDFParser.parse()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 2.4.1
    • parser
    • None

    Description

      I've wondered where the many "apache-tika-" files in the temp directory came from. It turns out that they are all (or most) PDF files so I looked at the PDF parser module. After looking at the file sizes and getting a file name I focused on the test PDFParserTest.testSortByPosition() where the first 2 parse tests have a leftover file and the 3rd one doesn't.

      The difference is that in the third one, PDFParser.parse() gets a TikaInputStream as parameter. TikaInputStream().get() returns its parameter. But in the first two, it creates a new object, which is never closed. So the resource cleanup is never done.

      Adding

                  if (!(stream instanceof TikaInputStream)) {
                      tstream.close();
                  }
      

      fixes this, i.e. no leftover files after running PDFParserTest.

      There's a null check in that method, but later the object is used without a null check. So either the null check isn't needed, or there is an NPE risk.

      Attachments

        Activity

          People

            tallison Tim Allison
            tilman Tilman Hausherr
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: