Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-2468

Index binary only if some Tika parser can support the binaries mimeType

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.0.15, 1.1.8
    • lucene
    • None

    Description

      Currently all binaries are passed to Tika for text extraction. However Tika can only parse those for which it has supported parser present. Therefore extraction logic should parse a binary only if the mimeType is supported by Tika.

      With this change jcr:mimeType would become a mandatory property

      JR2 had a similar check [1]

      [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/NodeIndexer.java#L932

      Attachments

        Issue Links

          Activity

            People

              chetanm Chetan Mehrotra
              chetanm Chetan Mehrotra
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: