Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2551

NullPointerException in generator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.15
    • 1.15
    • generator
    • None

    Description

      A NullPointerException is thrown during the crawl generate stage when I deploy to a hadoop cluster (but for some reason, it works fine locally).

      It looks like this is caused because the URLPartitioner class still has the old configure() method in there (which is never called, causing the normalizers field to remain null), rather than implementing the Configurable interface as detailed in the newer mapreduce API's Partitioner spec.

      Stack trace:

      java.lang.NullPointerException
       at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:76)
       at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:40)
       at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:716)
       at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
       at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
       at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:553)
       at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:546)
       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
      

       

      Oh and it might also be because a static URLPartitioner instance is being used in the Generator.Selector class... but it's only initialized in the setup() method of the Generator.Selector.SelectorMapper class! So that whole setup looks pretty weird...

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              hansbrende Hans Brende
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: