Details
-
Test
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
ManifoldCF 0.6
-
None
Description
User reports that ManifoldCF web crawls run on MySQL fail to find the correct number of documents, compared to web crawls run on PostgreSQL. The documents included differ from run to run. We need a test that duplicates the appropriate environment. >12000 documents, hop-count filtering enabled.
> - Max Hop on Links: 15
> - Max Hop on Redirects: 10
> - Include only hosts matching seeds: Checked
> - org.apache.manifoldcf.crawler.threads: 50
> - org.apache.manifoldcf.database.maxhandles: 100