Description
A NullPointerException is thrown during the crawl generate stage when I deploy to a hadoop cluster (but for some reason, it works fine locally).
It looks like this is caused because the URLPartitioner class still has the old configure() method in there (which is never called, causing the normalizers field to remain null), rather than implementing the Configurable interface as detailed in the newer mapreduce API's Partitioner spec.
Stack trace:
java.lang.NullPointerException at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:76) at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:40) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:716) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:553) at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:546) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Oh and it might also be because a static URLPartitioner instance is being used in the Generator.Selector class... but it's only initialized in the setup() method of the Generator.Selector.SelectorMapper class! So that whole setup looks pretty weird...
Attachments
Issue Links
- is caused by
-
NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce
- Closed
- links to