Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3223

ForkParser cannot serialize exceptions when using the ForkParser(Path, ParserFactoryFactory)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • core
    • None

    Description

      Using this project as an example https://github.com/nddipiazza/tika-fork-parser-example

      Problems happen when it encounters files that throw a valid exception. Example 002164.ppt from digicorpa is encrypted, so it should throw.

      When you use this constructor, you get the expected result:

      try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
      

      When you use this constructor

      try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), new ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory", parserArgs))) {
      

      You will get a class not found exception - failing to serialize the exceptions.

      org.apache.tika.exception.TikaException: Failed to communicate with a forked parser process. The process has most likely crashed due to some error like running out of memory. A new process will be started for the next parsing request.
      	at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275) ~[tika-core-1.24.1.jar:1.24.1]
      	at org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36) ~[classes/:?]
      	at org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317) ~[classes/:?]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
      	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) [?:?]
      	at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
      	at java.lang.Thread.run(Thread.java:834) [?:?]
      Caused by: java.io.IOException: Unable to deserialize an exception
      	at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295) ~[tika-core-1.24.1.jar:1.24.1]
      	at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) ~[tika-core-1.24.1.jar:1.24.1]
      	at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) ~[tika-core-1.24.1.jar:1.24.1]
      	... 8 more
      Caused by: java.lang.ClassNotFoundException: org/apache/tika/exception/EncryptedDocumentException
      	at java.lang.Class.forName0(Native Method) ~[?:?]
      	at java.lang.Class.forName(Class.java:398) ~[?:?]
      	at org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69) ~[tika-core-1.24.1.jar:1.24.1]
      	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?]
      	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862) ~[?:?]
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) ~[?:?]
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679) ~[?:?]
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493) ~[?:?]
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451) ~[?:?]
      	at org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110) ~[tika-core-1.24.1.jar:1.24.1]
      	at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292) ~[tika-core-1.24.1.jar:1.24.1]
      	at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) ~[tika-core-1.24.1.jar:1.24.1]
      	at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) ~[tika-core-1.24.1.jar:1.24.1]
      	... 8 more
      

      But I definitely have the Exception type on the classpath. Same thing happens for any tika exception. This is not limited to EncryptedDocumentException.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ndipiazza_gmail Nicholas DiPiazza
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: