Description
Always throws a NullPointerException when detect zip file, it can be reproduced through the following steps.
- Create a zip file with a index.xml, the xml is simple
<?xml version='1.0' encoding='UTF-8' ?> <index> </index>
- add dependency to pom.xml, the Key dependency ** is tika-parser-apple-module
<dependencies> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-core</artifactId> <version>2.2.1</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId> <type>pom</type> <version>2.2.1</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parser-apple-module</artifactId> <version>2.2.1</version> </dependency>
- using tika.detect to parse zip file, it will throws a NPE
String filePath = "123.zip"; Tika tika = new Tika(); type = tika.detect(new FileInputStream(new File(filePath)));
Notice that when using tika.detect(String name), it‘s normal and return "application/zip", the NPE situation only occur when using tika.detect(InputStream stream)。
It seems when tika parse a zip file through IWorkPackageParser, tika will parsing index.xml, it will parse '.Number', '.key', '.pages', 'encrypted' file using below class in xml, when Number, key, pages are all empty, the encrypted's namespace is null, then in the for-loop it will throws a NPE.
the source code below:
KEYNOTE("http://developer.apple.com/namespaces/keynote2", "presentation", MediaType.application("vnd.apple.keynote")), NUMBERS("http://developer.apple.com/namespaces/ls", "document", MediaType.application("vnd.apple.numbers")), PAGES("http://developer.apple.com/namespaces/sl", "document", MediaType.application("vnd.apple.pages")), ENCRYPTED(null, null, MediaType.application("x-tika-iworks-protected"));
public static IWORKDocumentType detectType(InputStream stream) { QName qname = new XmlRootExtractor().extractRootElement(stream); if (qname != null) { String uri = qname.getNamespaceURI(); String local = qname.getLocalPart(); for (IWORKDocumentType type : values()) { if (type.getNamespace().equals(uri) && type.getPart().equals(local)) { return type; } }