Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
We can currently configure parsers by the following means:
1) programmatically by direct calls to the parsers or their config objects
2) sending in a config object through the ParseContext
3) modifying .properties files for specific parsers (e.g. PDFParser)
Rather than scattering the landscape with .properties files for each parser, it would be great if we could specify parser parameters in the main config file, something along the lines of this:
<parser class="org.apache.tika.parser.audio.AudioParser"> <params> <int name="someparam1">2</int> <str name="someOtherParam2">something or other</str> </params> <mime>audio/basic</mime> <mime>audio/x-aiff</mime> <mime>audio/x-wav</mime> </parser>
Attachments
Issue Links
- is blocked by
-
TIKA-1657 Allow easier XML serialization of TikaConfig
- Resolved
- is related to
-
TIKA-1445 Figure out how to add Image metadata extraction to Tesseract parser
- Resolved
- relates to
-
TIKA-1680 Add configuration layer to configure, Parsers default configurable properties.
- Open
-
TIKA-3891 Add generic serialization of params to TikaConfigSeralizer
- Resolved
- links to