Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently, LCF has a relatively high-bar for evaluation and use, requiring developer expertise. Also, although LCF has a comprehensive UI, it is not currently packaged for use as a crawling engine for advanced applications.
A small set of individual feature requests are needed to address these issues. They are summarized briefly to show how they fit together for two initial releases of LCF, but will be broken out into individual LCF Jira issues.
Goals:
1. LCF as a standalone, downloadable, usable-out-of-the-box product (much as Solr is today)
2. LCF as a toolkit for developers needing customized crawling and repository access
3. An API-based crawling engine that can be integrated with applications (as Aperture is today)
Larger goals:
1. Make it very easy for users to evaluate LCF.
2. Make it very easy for developers to customize LCF.
3. Make it very easy for appplications to fully manage and control LCF in operation.
Two phases:
1) Standalone, packaged app that is super-easy to evaluate and deploy. Call it LCF 0.5.
2) API-based crawling engine for applications for which the UI might not be appropriate. Call it LCF 1.0.
Phase 1
-------
LCF 0.5 right out of the box would interface loosely with Solr 1.4 or later.
It would contain roughly the features that are currently in place or currently underway, plus a little more.
Specifically, LCF 0.5 would contain these additional capabilities:
1. Plug-in architecture for connectors (CONNECTORS-40 - DONE)
2. Packaged app ready to run with embedded Jetty app server (CONNECTORS-59)
3. Bundled with database - PostgreSQL or derby - ready to run without additional manual setup (CONNECTORS-55)
4. Mini-API to initially configure default connections and "example" jobs for file system and web crawl (CONNECTORS-58)
5. Agent process started automatically (CONNECTORS-60)
6. Solr output connector option to commit at end of job, by default (CONNECTORS-57)
Installation and basic evaluation of LCF would be essentially as simple as Solr is today. The example
connections and jobs would permit the user to initiate example crawls of a file system example
directory and an example web on the LCF web site with just a couple of clicks (as opposed to the
detailed manual setup required today to create repository and output connections and jobs.
It is worth considering whether the SharePoint connector could also be included as part of the default package.
Users could then add additional connectors and repositories and jobs as desired.
Timeframe for release? Level of effort?
Phase 2
-------
The essence of Phase 2 is that LCF would be split to allow direct, full API access to LCF as a
crawling "engine", in additional to the full LCF UI. Call this LCF 1.0.
Specifically, LCF 1.0 would contain these additional capabilities:
1. Full API for LCF as a crawling engine (CONNECTORS-56)
2. LCF can be bundled within an app (CONNECTORS-61)
3. LCF event and activity notification for full control by an application (CONNECTORS-41)
Overall, LCF will offer roughly the same crawling capabilities as with LCF 0.5, plus whatever bug
fixes and minor enhancements might also be added.
Timeframe for release? Level of effort?
-------------------------
Issues:
- Can we package PostgreSQL with LCF so LCF can set it up?
- Or do we need Derby for that purpose?
- Managing multiple processes (UI, database, agent, app processes)
- What exactly would the API look like? (URL, XML, JSON, YAML?)