Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-2472

TikaAnnotator can't find XML parser when used in a PEAR file with Java 1.5 or later

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.3.1Addons
    • 2.4.0Addons
    • addons
    • None
    • Java 1.5 and later

    Description

      When TikaAnnotator is part of a PEAR file, then when you call UIMAFramework.produceAnalysisEngine() and Tika asks the system for an XML parser, it fails with the exception:

      javax.xml.parsers.FactoryConfigurationError: Provider for javax.xml.parsers.DocumentBuilderFactory cannot be found

      This is because the XML parser is now built into Java, but the UIMA classloader (used with PEAR files) finds the parser implementation in xml-apis.jar first, which is older and incompatible with the current XML interfaces. xml-apis.jar is included because it's one of the eventual maven dependencies for Tika 0.7. See this issue for more information:

      https://issues.apache.org/jira/browse/TIKA-412

      This was fixed in Tika 0.8.

      A work-around for those UIMA users who want to use TikaAnnotator in PEAR files with Java 1.6 is to exclude xml-apis from their PEAR file:

      <dependency>
      <groupId>org.apache.uima</groupId>
      <artifactId>TikaAnnotator</artifactId>
      <exclusions>
      <exclusion>
      <groupId>xml-apis</groupId>
      <artifactId>xml-apis</artifactId>
      </exclusion>
      </exclusions>
      </dependency>

      However, a better fix would be to update the version of Tika used in TikaAnnotator.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              holmberg Adam Holmberg
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: