The internal Tika extractor treats all metadata as strings, using the Tika library. I don't think the date format is configurable. Indeed, there's a blog post on this:
https://grokbase.com/t/tika/user/10982he7yd/how-can-i-configure-tika-to-extract-dates-in-single-format
Note that Tika tries to maintain the date format present in the original spreadsheet!!
The solution proposed when you want a specific date format is this:
- Write your own excel parser for Tika, which ignores the date formatting
set for cells, and always uses iso8601
That's not going to cut it here because we don't have any information that would allow us to autodetect the incoming format properly. It's basically just a text file and there are no hints, especially for dates like "01-01-2010". Which comes first, the day or the month?
The external Tika extractor has even less configurability because you cannot run custom code there.
Now, suppose all you want to do is post-process just dates to change the separator character. Well, we do not know whether the field being returned from Tika is a date even. If we replaced all /'s with -'s in it then we'd corrupt other kinds of fields.
My conclusion: there's nothing we can do in ManifoldCF to fix this problem. A solution might be found in Tika itself, but only if somebody tickets it. Tika would need to go through the column definitions and understand which columns were dates and act accordingly. Feel free to open a Tika ticket accordingly.
alexlumpov could you please take a look on this issue? It' been pending fo a long time, and we need a solution on that. Thank you!