Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1632

Please help me im stuck on using 20 newsgroups example on Windows

    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Trivial
    • Resolution: Not A Problem
    • 0.9
    • 0.10.0
    • classic

    Description

      Hello there, I've been using hadoop & mahout on my windows OS and I started the hadoop cluster before starting the mahout in order to use the cluster for it, then, I did start the mahout to test the 20newsgroups example but it throws an exception as not a valid DFS filename as show below in details from the beginning :

      Microsoft Windows [Version 6.1.7601]
      Copyright (c) 2009 Microsoft Corporation. All rights reserved.

      C:\Users\Admin>cd\

      C:\>cd mahout

      C:\mahout>cd examples

      C:\mahout\examples>cd bin

      C:\mahout\examples\bin>classify-20newsgroups.sh
      Welcome to Git (version 1.9.4-preview20140815)

      Run 'git help git' to display the help index.
      Run 'git help <command>' to display help for specific commands.
      Please select a number to choose the corresponding task to run
      1. cnaivebayes
      2. naivebayes
      3. sgd
      4. clean – cleans up the work area in /tmp/mahout-work-
      Enter your choice : 2
      ok. You chose 2 and we'll use naivebayes
      creating work directory at /tmp/mahout-work-
      + echo 'Preparing 20newsgroups data'
      Preparing 20newsgroups data
      + rm rf /tmp/mahout-work/20news-all
      + mkdir /tmp/mahout-work-/20news-all
      + cp R /tmp/mahout-work/20news-bydate/20news-bydate-test/alt.atheism /tmp/maho
      ut-work-/20news-bydate/20news-bydate-test/comp.graphics /tmp/mahout-work-/20news
      bydate/20news-bydate-test/comp.os.ms-windows.misc /tmp/mahout-work/20news-byda
      te/20news-bydate-test/comp.sys.ibm.pc.hardware /tmp/mahout-work-/20news-bydate/2
      0news-bydate-test/comp.sys.mac.hardware /tmp/mahout-work-/20news-bydate/20news-b
      ydate-test/comp.windows.x /tmp/mahout-work-/20news-bydate/20news-bydate-test/mis
      c.forsale /tmp/mahout-work-/20news-bydate/20news-bydate-test/rec.autos /tmp/maho
      ut-work-/20news-bydate/20news-bydate-test/rec.motorcycles /tmp/mahout-work-/20ne
      ws-bydate/20news-bydate-test/rec.sport.baseball /tmp/mahout-work-/20news-bydate/
      20news-bydate-test/rec.sport.hockey /tmp/mahout-work-/20news-bydate/20news-bydat
      e-test/sci.crypt /tmp/mahout-work-/20news-bydate/20news-bydate-test/sci.electron
      ics /tmp/mahout-work-/20news-bydate/20news-bydate-test/sci.med /tmp/mahout-work-
      /20news-bydate/20news-bydate-test/sci.space /tmp/mahout-work-/20news-bydate/20ne
      ws-bydate-test/soc.religion.christian /tmp/mahout-work-/20news-bydate/20news-byd
      ate-test/talk.politics.guns /tmp/mahout-work-/20news-bydate/20news-bydate-test/t
      alk.politics.mideast /tmp/mahout-work-/20news-bydate/20news-bydate-test/talk.pol
      itics.misc /tmp/mahout-work-/20news-bydate/20news-bydate-test/talk.religion.misc
      /tmp/mahout-work-/20news-bydate/20news-bydate-train/alt.atheism /tmp/mahout-wor
      k-/20news-bydate/20news-bydate-train/comp.graphics /tmp/mahout-work-/20news-byda
      te/20news-bydate-train/comp.os.ms-windows.misc /tmp/mahout-work-/20news-bydate/2
      0news-bydate-train/comp.sys.ibm.pc.hardware /tmp/mahout-work-/20news-bydate/20ne
      ws-bydate-train/comp.sys.mac.hardware /tmp/mahout-work-/20news-bydate/20news-byd
      ate-train/comp.windows.x /tmp/mahout-work-/20news-bydate/20news-bydate-train/mis
      c.forsale /tmp/mahout-work-/20news-bydate/20news-bydate-train/rec.autos /tmp/mah
      out-work-/20news-bydate/20news-bydate-train/rec.motorcycles /tmp/mahout-work-/20
      news-bydate/20news-bydate-train/rec.sport.baseball /tmp/mahout-work-/20news-byda
      te/20news-bydate-train/rec.sport.hockey /tmp/mahout-work-/20news-bydate/20news-b
      ydate-train/sci.crypt /tmp/mahout-work-/20news-bydate/20news-bydate-train/sci.el
      ectronics /tmp/mahout-work-/20news-bydate/20news-bydate-train/sci.med /tmp/mahou
      t-work-/20news-bydate/20news-bydate-train/sci.space /tmp/mahout-work-/20news-byd
      ate/20news-bydate-train/soc.religion.christian /tmp/mahout-work-/20news-bydate/2
      0news-bydate-train/talk.politics.guns /tmp/mahout-work-/20news-bydate/20news-byd
      ate-train/talk.politics.mideast /tmp/mahout-work-/20news-bydate/20news-bydate-tr
      ain/talk.politics.misc /tmp/mahout-work-/20news-bydate/20news-bydate-train/talk.
      religion.misc /tmp/mahout-work-/20news-all
      + '[' 'C:\hadp' '!=' '' ']'
      + '[' '' == '' ']'
      + echo 'Copying 20newsgroups data to HDFS'
      Copying 20newsgroups data to HDFS
      + set +e
      + 'C:\hadp/bin/hadoop' dfs rmr /tmp/mahout-work/20news-all
      /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory
      DEPRECATED: Use of this script to execute hdfs command is deprecated.
      Instead use the hdfs command for it.

      /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory
      rmr: DEPRECATED: Please use 'rm -r' instead.
      rmr: Pathname /C:/Users/Admin/AppData/Local/Temp/mahout-work/20news-all from h
      dfs://localhost:9000/C:/Users/Admin/AppData/Local/Temp/mahout-work-/20news-all i
      s not a valid DFS filename.
      Usage: hadoop fs [generic options] -rmr
      + set -e
      + 'C:\hadp/bin/hadoop' dfs put /tmp/mahout-work/20news-all /tmp/mahout-work-/2
      0news-all
      /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory
      DEPRECATED: Use of this script to execute hdfs command is deprecated.
      Instead use the hdfs command for it.

      /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory
      put: Pathname /C:/Users/Admin/AppData/Local/Temp/mahout-work/20news-all from h
      dfs://localhost:9000/C:/Users/Admin/AppData/Local/Temp/mahout-work-/20news-all i
      s not a valid DFS filename.
      Usage: hadoop fs [generic options] -put [-f] [-p] <localsrc> ... <dst>
      + echo 'Creating sequence files from 20newsgroups data'
      Creating sequence files from 20newsgroups data
      + ./bin/mahout seqdirectory i /tmp/mahout-work/20news-all o /tmp/mahout-work
      /20news-seq -ow
      /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory
      Running on hadoop, using \hadp/bin/hadoop and HADOOP_CONF_DIR=
      MAHOUT-JOB: /c/mahout/examples/target/mahout-examples-0.9-job.jar
      /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory
      14/12/09 21:48:57 INFO common.AbstractJob: Command line arguments: {--charset=[U
      TF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.
      mahout.text.PrefixAdditionFilter], --input=[C:/Users/Admin/AppData/Local/Temp/ma
      hout-work-/20news-all], --keyPrefix=[], --method=[mapreduce], --output=[C:/Users
      /Admin/AppData/Local/Temp/mahout-work-/20news-seq], --overwrite=null, --startPha
      se=[0], --tempDir=[temp]}
      Exception in thread "main" java.lang.IllegalArgumentException: Pathname /C:/User
      s/Admin/AppData/Local/Temp/mahout-work-/20news-seq from C:/Users/Admin/AppData/L
      ocal/Temp/mahout-work-/20news-seq is not a valid DFS filename.
      at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedF
      ileSystem.java:187)
      at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFi
      leSystem.java:101)
      at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFil
      eSystem.java:1068)
      at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFil
      eSystem.java:1064)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkRes
      olver.java:81)
      at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(Distribute
      dFileSystem.java:1064)
      at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
      at org.apache.mahout.common.HadoopUtil.delete(HadoopUtil.java:192)
      at org.apache.mahout.common.HadoopUtil.delete(HadoopUtil.java:200)
      at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFr
      omDirectory.java:84)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesF
      romDirectory.java:65)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
      java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
      sorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra
      mDriver.java:72)
      at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
      at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
      java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
      sorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

      C:\mahout\examples\bin>

      Please help me I'm new to the big data tools and I need this issue resolved as soon as possible.

      Thank you,,,

      Attachments

        Activity

          People

            smarthi Suneel Marthi
            Mishari_SH Mishari SH
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: