Description
I've wondered where the many "apache-tika-" files in the temp directory came from. It turns out that they are all (or most) PDF files so I looked at the PDF parser module. After looking at the file sizes and getting a file name I focused on the test PDFParserTest.testSortByPosition() where the first 2 parse tests have a leftover file and the 3rd one doesn't.
The difference is that in the third one, PDFParser.parse() gets a TikaInputStream as parameter. TikaInputStream().get() returns its parameter. But in the first two, it creates a new object, which is never closed. So the resource cleanup is never done.
Adding
if (!(stream instanceof TikaInputStream)) { tstream.close(); }
fixes this, i.e. no leftover files after running PDFParserTest.
There's a null check in that method, but later the object is used without a null check. So either the null check isn't needed, or there is an NPE risk.