Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1177

Generator to select on retry interval

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.5
    • generator
    • None
    • Patch Available

    Description

      The generator already has a mechanism to select entries with a score larger than specified threshold but should also have a means to select entries with a retry interval lower than specified by a configuration option.

      Such a feature is particulary useful when dealing with too large crawldb's where you still want a crawl to fetch rapid changing url's first.

      This issue should also add the missing generate.min.score configuration to nutch-default.

      Attachments

        1. NUTCH-1177-1.5-1.patch
          3 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: