Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Using this project as an example https://github.com/nddipiazza/tika-fork-parser-example
Problems happen when it encounters files that throw a valid exception. Example 002164.ppt from digicorpa is encrypted, so it should throw.
When you use this constructor, you get the expected result:
try (ForkParser forkParser = new ForkParser(getClass().getClassLoader())) {
When you use this constructor
try (ForkParser forkParser = new ForkParser(Paths.get(pathToTikaMainDist), new ParserFactoryFactory("org.apache.tika.fork.main.CollectingParserFactory", parserArgs))) {
You will get a class not found exception - failing to serialize the exceptions.
org.apache.tika.exception.TikaException: Failed to communicate with a forked parser process. The process has most likely crashed due to some error like running out of memory. A new process will be started for the next parsing request. at org.apache.tika.fork.ForkParser.parse(ForkParser.java:275) ~[tika-core-1.24.1.jar:1.24.1] at org.apache.tika.client.CollectingParser.parse(CollectingParser.java:36) ~[classes/:?] at org.apache.tika.client.TikaAsyncMain.lambda$runThreads$0(TikaAsyncMain.java:317) ~[classes/:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: java.io.IOException: Unable to deserialize an exception at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:295) ~[tika-core-1.24.1.jar:1.24.1] at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) ~[tika-core-1.24.1.jar:1.24.1] at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) ~[tika-core-1.24.1.jar:1.24.1] ... 8 more Caused by: java.lang.ClassNotFoundException: org/apache/tika/exception/EncryptedDocumentException at java.lang.Class.forName0(Native Method) ~[?:?] at java.lang.Class.forName(Class.java:398) ~[?:?] at org.apache.tika.fork.ForkObjectInputStream.resolveClass(ForkObjectInputStream.java:69) ~[tika-core-1.24.1.jar:1.24.1] at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1995) ~[?:?] at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862) ~[?:?] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) ~[?:?] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679) ~[?:?] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:493) ~[?:?] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:451) ~[?:?] at org.apache.tika.fork.ForkObjectInputStream.readObject(ForkObjectInputStream.java:110) ~[tika-core-1.24.1.jar:1.24.1] at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:292) ~[tika-core-1.24.1.jar:1.24.1] at org.apache.tika.fork.ForkClient.call(ForkClient.java:209) ~[tika-core-1.24.1.jar:1.24.1] at org.apache.tika.fork.ForkParser.parse(ForkParser.java:267) ~[tika-core-1.24.1.jar:1.24.1] ... 8 more
But I definitely have the Exception type on the classpath. Same thing happens for any tika exception. This is not limited to EncryptedDocumentException.