[JCR-2506] Stop text extraction when the maxFieldLength limit is reached - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1
Component/s: indexing, jackrabbit-core
Labels:
None

Description

When indexing large documents the text extraction often takes quite a while and uses lots of memory even if only the first maxFieldLength (by default 10000) tokens are used. I'd like to add a maxExtractLength parameter that can be used to set the maximum number of characters to extract from a binary. The default value of this parameter could be something like ten times the maxFieldLength setting.

Attachments

Issue Links

is related to

OAK-2470 Support for maxExtractLength while parsing binaries with Tika

Closed

Activity

People

Assignee:: Jukka Zitting

Reporter:: Jukka Zitting

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 22/Feb/10 11:38

Updated:: 31/Jan/15 08:59

Resolved:: 23/Feb/10 14:33