Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
When indexing large documents the text extraction often takes quite a while and uses lots of memory even if only the first maxFieldLength (by default 10000) tokens are used. I'd like to add a maxExtractLength parameter that can be used to set the maximum number of characters to extract from a binary. The default value of this parameter could be something like ten times the maxFieldLength setting.
Attachments
Issue Links
- is related to
-
OAK-2470 Support for maxExtractLength while parsing binaries with Tika
- Closed