Description
Tika-1.19.1 throws the following exception with some corrupt docx files (MS Word complains but fixes them) previously handled without problems by tika-1.18. Stacktrace bellow:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@79efa1ad at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84) at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:358) at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:309) at org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94) at org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77) at javax.swing.TransferHandler.importData(Unknown Source) at javax.swing.TransferHandler$DropHandler.drop(Unknown Source) at java.awt.dnd.DropTarget.drop(Unknown Source) at javax.swing.TransferHandler$SwingDropTarget.drop(Unknown Source) at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(Unknown Source) at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(Unknown Source) at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(Unknown Source) at sun.awt.dnd.SunDropTargetEvent.dispatch(Unknown Source) at java.awt.Component.dispatchEventImpl(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.processDropTargetEvent(Unknown Source) at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Window.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.EventQueue.dispatchEventImpl(Unknown Source) at java.awt.EventQueue.access$500(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source) at java.awt.EventQueue.dispatchEvent(Unknown Source) at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.run(Unknown Source) Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: Could not open the specified zip entry source stream at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:214) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:196) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:170) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:151) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:123) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:234) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:81) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:110) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 43 more Caused by: java.io.EOFException at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFully(ZipArchiveInputStream.java:803) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFully(ZipArchiveInputStream.java:795) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.skipRemainderOfArchive(ZipArchiveInputStream.java:1014) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:257) at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.getNextEntry(ZipArchiveThresholdInputStream.java:139) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:47) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:212) ... 51 more
Attachments
Attachments
Issue Links
- relates to
-
TIKA-2726 Handle truncated ooxml more robustly
- Resolved