Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2407

Memory leak causing Nutch Server to run out of memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3.1, 1.16
    • 1.21
    • nutch server
    • None
    • Ubuntu 16.04 64-bit
      Oracle Java 8 64-bit
      Nutch 2.3.1 (standalone deployment)
      MongoDB 3.4

    Description

      My application is trying to perform continuous crawling using Nutch REST services. The application injects a seed URL and then repeats GENERATE/FETCH/PARSE/UPDATEDB sequence requested number of times (each step in the sequence is executed upon successful competition of the previous step then the whole sequence is repeated again). Here is a brief description of the job:

      • Number of GENERATE/FETCH/PARSE/UPDATEDB cycles per run: 50
      • 'topN' parameter value of GENERATE step in each cycle: 10
      • Seed URL: http://www.cnn.com
      • Regex URL filters for all jobs:
        • "-^.{1000,}$" - exclude very long URLs
        • "+." - include the rest

      To monitor Nutch server I use Java VisualVM that comes with Java SDK. After each run (50 cycles of GENERATE/FETCH/PARSE/UPDATEDB) I perform garbage collection using the mentioned tool and check memory usage. My observation is that Nutch Server leaks ~25MB per run.

      NOTES: I added custom HTTP DELETE services to clean job history in NutchServerPoolExecutor and remove all custom configurations from RAMConfManager after each run. So observed ~25MB memory leak is after job history/configuration cleanup.

      Attachments

        1. started.txt
          164 kB
          Vyacheslav Pascarel
        2. second.txt
          179 kB
          Vyacheslav Pascarel
        3. first.txt
          178 kB
          Vyacheslav Pascarel

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Vyacheslav Vyacheslav Pascarel
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: