Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
With Tika 1.7's RecursiveParserWrapper, it is possible to maintain metadata of individual attachments/embedded documents. Tika's default handling was to maintain the metadata of the container document and concatenate the contents of all embedded files. With SOLR-7189, we added the legacy behavior.
It might be handy, for example, to be able to send an MSG file through DIH and treat the container email as well each attachment as separate (child?) documents, or send a zip of jpeg files and correctly index the geo locations for each image file.
Attachments
Issue Links
- is superceded by
-
SOLR-14783 Remove DIH from 9.0
- Closed
- relates to
-
SOLR-7189 Allow DIH to extract content from embedded documents via Tika
- Closed