Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3578

RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.27
    • None
    • None
    • None
    • Windows 10 x64

    Description

      I try to parse a valid docx document and get this error:

      org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@f214a53
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:297)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

      <business logic>

      Caused by: org.apache.poi.ooxml.POIXMLException: org.apache.xmlbeans.XmlException: Element hdr@http://schemas.openxmlformats.org/wordprocessingml/2006/main is not a valid ftr@http://schemas.openxmlformats.org/wordprocessingml/2006/main document or a valid substitution.
      at org.apache.poi.xwpf.usermodel.XWPFFooter.onDocumentRead(XWPFFooter.java:119)
      at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:233)
      at org.apache.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:184)
      at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:138)
      at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
      at org.apache.poi.ooxml.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:224)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:173)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:113)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
      ... 11 more
      Caused by: org.apache.xmlbeans.XmlException: Element hdr@http://schemas.openxmlformats.org/wordprocessingml/2006/main is not a valid ftr@http://schemas.openxmlformats.org/wordprocessingml/2006/main document or a valid substitution.
      at org.apache.xmlbeans.impl.store.Locale.autoTypeDocument(Locale.java:324)
      at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1275)
      at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1259)
      at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
      at org.openxmlformats.schemas.wordprocessingml.x2006.main.FtrDocument$Factory.parse(Unknown Source)
      at org.apache.poi.xwpf.usermodel.XWPFFooter.onDocumentRead(XWPFFooter.java:94)
      ... 19 more

      Attachments

        1. 03.docx
          48 kB
          redmanmale

        Issue Links

          Activity

            People

              Unassigned Unassigned
              redmanmale redmanmale
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: