[NUTCH-2551] NullPointerException in generator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.15
Fix Version/s: 1.15
Component/s: generator
Labels:
None

Description

A NullPointerException is thrown during the crawl generate stage when I deploy to a hadoop cluster (but for some reason, it works fine locally).

It looks like this is caused because the URLPartitioner class still has the old configure() method in there (which is never called, causing the normalizers field to remain null), rather than implementing the Configurable interface as detailed in the newer mapreduce API's Partitioner spec.

Stack trace:

java.lang.NullPointerException
 at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:76)
 at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:40)
 at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:716)
 at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
 at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
 at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:553)
 at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:546)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

Oh and it might also be because a static URLPartitioner instance is being used in the Generator.Selector class... but it's only initialized in the setup() method of the Generator.Selector.SelectorMapper class! So that whole setup looks pretty weird...

Attachments

Issue Links

is caused by

NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce

Closed

links to

GitHub Pull Request #316

Activity

People

Assignee:: Sebastian Nagel

Reporter:: Hans Brende

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 08/Apr/18 05:38

Updated:: 01/Oct/19 14:29

Resolved:: 12/Apr/18 13:03