Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-739

SolrDeleteDuplications too slow when using hadoop

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.1
    • indexer
    • None
    • hadoop cluster with 3 nodes
      Map Task Capacity: 6
      Reduce Task Capacity: 6
      Indexer: one instance of solr server (on the one of slave nodes)

    Description

      in my environment i always have many warnings like this on the dedup step

      Task attempt_200905270022_0212_r_000003_0 failed to report status for 600 seconds. Killing!
      

      solr logs:

      INFO: [] webapp=/solr path=/update params={wt=javabin&waitFlush=true&optimize=true&waitSearcher=true&maxSegments=1&version=2.2} status=0 QTime=173741
      May 27, 2009 10:29:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish
      INFO: {optimize=} 0 173599
      May 27, 2009 10:29:27 AM org.apache.solr.core.SolrCore execute
      INFO: [] webapp=/solr path=/update params={wt=javabin&waitFlush=true&optimize=true&waitSearcher=true&maxSegments=1&version=2.2} status=0 QTime=173599
      May 27, 2009 10:29:27 AM org.apache.solr.search.SolrIndexSearcher close
      INFO: Closing Searcher@2ad9ac58 main
      May 27, 2009 10:29:27 AM org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean getMBeanInfo
      WARNING: Could not getStatistics on info bean org.apache.solr.search.SolrIndexSearcher
      org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
      ....
      

      So I think the problem in the piece of code on line 301 of SolrDeleteDuplications ( solr.optimize() ). Because we have few job tasks each of ones tries to optimize solr indexes before closing.
      The simplest way to avoid this bug - removing this line and sending "<optimize/>" message directly to solr server after dedup step

      Attachments

        Activity

          People

            ab Andrzej Bialecki
            dmitry.lihachev Dmitry Lihachev
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: