Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16428

Spark file system watcher not working on Windows

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.6.2
    • None
    • Ubuntu 15.10 64 bit, Windows 7 Enterprise 64 bit

    Description

      Two people tested Apache Spark on their computers...

      [Spark Download - http://i.stack.imgur.com/z1oqu.png]

      We downloaded the version of Spark prebuild for Hadoop 2.6, went to the folder /spark-1.6.2-bin-hadoop2.6/, created a "tmp" directory, went to that directory, and ran:

      $ bin/run-example org.apache.spark.examples.streaming.HdfsWordCount tmp

      I added arbitrary files content1 and content2dssdgdg to that "tmp" directory.

      -------------------------------------------
      Time: 1467921704000 ms
      -------------------------------------------
      (content1,1)
      (content2dssdgdg,1)

      -------------------------------------------
      Time: 1467921706000 ms

      Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.

      This is preventing us from getting work done with Spark.

      Link: http://stackoverflow.com/questions/38254405/spark-file-system-watcher-not-working-on-windows

      Attachments

        Activity

          dongjoon Dongjoon Hyun added a comment -

          Hi, could you test that with 1.6.1 or 2.0.0rc2, too?
          Also, I'm wondering if new candidate of Windows 10 (supporting Ubuntu and Bash) would be your candidate OS?

          dongjoon Dongjoon Hyun added a comment - Hi, could you test that with 1.6.1 or 2.0.0rc2, too? Also, I'm wondering if new candidate of Windows 10 (supporting Ubuntu and Bash) would be your candidate OS?
          jerryshao Saisai Shao added a comment -

          Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.

          Can you please elaborate this? I'm not sure why files will be deleted, from my understanding Spark Streaming will not delete processed files, I'm not sure what are you particularly mentioned about "those files"?

          jerryshao Saisai Shao added a comment - Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop. Can you please elaborate this? I'm not sure why files will be deleted, from my understanding Spark Streaming will not delete processed files, I'm not sure what are you particularly mentioned about "those files"?
          hghina0 Hiren Ghinaiya added a comment - - edited

          Hello, This more looks like Hadoop setup issue on windows. You need to provide directory hosted on hadoop compatible file system (HCFS) otherwise it will not auto-detect. Follow steps at https://wiki.apache.org/hadoop/Hadoop2OnWindows while running hadoop on windows.

          Instead of compiling hadoop, I used hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries. To use this hadoop version I need to use spark version that is pre-built for user provided hadoop. I set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html. Also put %HADOOP_HOME%\lib\native on PATH. Once setup, I followed steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount I need to pass hdfs:///tmp as directory path arg. Now I see spark is able to detect new file showing up in HDFS.

          hghina0 Hiren Ghinaiya added a comment - - edited Hello, This more looks like Hadoop setup issue on windows. You need to provide directory hosted on hadoop compatible file system (HCFS) otherwise it will not auto-detect. Follow steps at https://wiki.apache.org/hadoop/Hadoop2OnWindows while running hadoop on windows. Instead of compiling hadoop, I used hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries . To use this hadoop version I need to use spark version that is pre-built for user provided hadoop. I set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html . Also put %HADOOP_HOME%\lib\native on PATH. Once setup, I followed steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount I need to pass hdfs:///tmp as directory path arg. Now I see spark is able to detect new file showing up in HDFS.
          dongjoon Dongjoon Hyun added a comment -

          Sounds Great, hghina0!

          dongjoon Dongjoon Hyun added a comment - Sounds Great, hghina0 !
          gurwls223 Hyukjin Kwon added a comment - - edited

          Hi all, so, is this issue resolvable?

          gurwls223 Hyukjin Kwon added a comment - - edited Hi all, so, is this issue resolvable?

          People

            Unassigned Unassigned
            johnreed2 John-Michael Reed
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: