Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Implemented
-
None
-
None
-
None
Description
We should upgrade to the new Hadoop API for Nutch trunk as already has been done for the Nutchgora branch. If i'm not mistaken we can already upgrade to the latest 0.20.5 version that still carries the legacy API so we can, without immediately upgrading to 0.21 or higher, port the jobs to the new API without having the need for a separate branch to work on.
To the committers who created/ported jobs in NutchGora, please write down your advice and experience.
http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
Attachments
Issue Links
- is superceded by
-
NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce
- Closed
1.
|
Migrate DomainStatistics to MapReduce API | Closed | Markus Jelsma | |
2.
|
Migrate WebGraph to MapReduce API | Closed | lufeng | |
3.
|
Migrate FreeGenerator to MapReduce API | Closed | Unassigned | |
4.
|
Migrate CrawlDBScanner to MapReduce API | Closed | Unassigned | |
5.
|
Migrate CrawlDbReader to MapReduce API | Closed | Markus Jelsma |