[SOLR-8590] example/files improvements - ASF JIRA

XML

Word

Printable

JSON

There are several example/files improvements/fixes that are warranted:

Fix e-mail and URL field names (<email>_ss and <url>_ss, with angle brackets in field names), also add display of these fields in /browse results rendering
Improve quality of extracted phrases
Extract, facet, and display acronyms
Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title
Add grouping by doc_type perhaps
fix debug mode - currently does not update the parsed query debug output (this is probably a bug in data driven /browse as well)
Harden update-script: it currently errors if documents do not have a "content" field (eg indexing basic CSV), but should instead skip extraction of e-mail addresses and URLs when no "content". Not quite the use case (no "content") for example/files, but no reason to error in the update script at least.
Filter out bogus e-mail addresses. I'm seeing email_ss = "?@[^],\,/^@[$_a-z]" for some documents (using Solr docs/ directory as the dataset)