Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-826

Mailing list is broken.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 1.1
    • None
    • None

    Description

      All of the following addresses are failing:

      nutch-user@nutch.apache.org
      nutch-user-subscribe@nutch.apache.org
      nutch-user-subscribe@lucene.apache.org

      For the last one, the mailer daemon said
      "This mailing list has moved to user at nutch.apache.org."

      Below is the message I tried to send:

      Hi people,

      I've been banging my head against this problem for two days now.
      Simply, I want to add a field with the value of a given meta tag.

      I've been trying the parse-xml plugin, but that seems that it doesn't
      work with version 1.0. I've tried the code at
      http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html
      and it hasn't worked. I don't even know why. I don't even know if my
      plugin is being used... or even looked for! Nutch seems to have a
      infuriating "Fail silently" policy for plugins. I put a
      System.exit(1) in my filters just to see if my code is even being
      encountered. It has not in spite of my config telling it to.

      Here's my config:
      nutch-site.xml
      ...
      <property>
      <name>plugin.includes</name>
      <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|metadata</value>
      </property>
      ...

      parse-plugins.xml
      ...
      <mimeType name="application/xhtml+xml">
      <plugin id="parse-html" />
      <plugin id="metadata" />
      </mimeType>

      <mimeType name="text/html">
      <plugin id="parse-html" />
      <plugin id="metadata" />
      </mimeType>

      <mimeType name="text/sgml">
      <plugin id="parse-html" />
      <plugin id="metadata" />
      </mimeType>

      <mimeType name="text/xml">
      <plugin id="parse-html" />
      <plugin id="parse-rss" />
      <plugin id="metadata" />
      <plugin id="feed" />
      </mimeType>
      ...
      <alias name="metadata"
      extension-id="com.example.website.nutch.parsing.MetaTagExtractorParseFilter"
      />
      ...

      I've also copied the plugin.xml and jar from my build/metadata to the
      plugins root dir.

      Nonetheless, Nutch runs and puts data in solr for me. Afaik, Nutch is
      completely unaware of my plugin despite my config options. Is the
      some other place I need to tell Nutch to use my plugin? Is there some
      other approach to do this without having to write a plugin? This does
      seem like a lot of work to simply get a meta tag into a field. Any
      help would be appreciated.

      Sincerely,

      John Sherwood

      Attachments

        Activity

          People

            jnioche Julien Nioche
            ponny John Sherwood
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: