Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1774

Crawling from REST API giving NullPointerException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 2.2.1
    • 2.3
    • REST_api
    • None
    • Patch Available

    Description

      Crawling is not working from REST API.

      Steps to reproduce.
      -----------------------
      1. Start the Nutch server (port 9000).
      2. Submit the PUT request , to create/initiate crawl job.
      eg:
      URL: http://localhost:9000/nutch/jobs
      HTTP METHOD: PUT
      Content:
      {
      "crawl":"123",
      "type":"crawl",
      "conf":"default",
      "args":

      { "class":"org.apache.nutch.crawl.Crawler", "seed":"http://www.somesite.com", "seedDir":"runtime/local/url/url.txt", "depth":2 }

      }
      3. Getting the following exception in Generator phase.
      2014-05-13 11:37:57,863 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(435)) - job_local1326997137_0002
      java.lang.NullPointerException
      at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
      at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
      at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

      Attachments

        1. NUTCH-1774.patch
          1 kB
          sreemanth pulagam

        Activity

          People

            Unassigned Unassigned
            sreemanth sreemanth pulagam
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: