[SPARK-16428] Spark file system watcher not working on Windows - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.6.2
Fix Version/s: None
Component/s: Examples, Input/Output, Spark Core, Windows
Labels:
- bulk-closed
Environment:

Ubuntu 15.10 64 bit, Windows 7 Enterprise 64 bit

Flags:

Important
External issue URL:
http://stackoverflow.com/questions/38254405/spark-file-system-watcher-not-working-on-windows

Description

Two people tested Apache Spark on their computers...

[Spark Download - http://i.stack.imgur.com/z1oqu.png]

We downloaded the version of Spark prebuild for Hadoop 2.6, went to the folder /spark-1.6.2-bin-hadoop2.6/, created a "tmp" directory, went to that directory, and ran:

$ bin/run-example org.apache.spark.examples.streaming.HdfsWordCount tmp

I added arbitrary files content1 and content2dssdgdg to that "tmp" directory.

-------------------------------------------
Time: 1467921704000 ms
-------------------------------------------
(content1,1)
(content2dssdgdg,1)

-------------------------------------------
Time: 1467921706000 ms

Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.

This is preventing us from getting work done with Spark.

Link: http://stackoverflow.com/questions/38254405/spark-file-system-watcher-not-working-on-windows

Attachments

Activity

Ascending order - Click to sort in descending order

Dongjoon Hyun added a comment - 09/Jul/16 20:52

Hi, could you test that with 1.6.1 or 2.0.0rc2, too?
Also, I'm wondering if new candidate of Windows 10 (supporting Ubuntu and Bash) would be your candidate OS?

Dongjoon Hyun added a comment - 09/Jul/16 20:52 Hi, could you test that with 1.6.1 or 2.0.0rc2, too? Also, I'm wondering if new candidate of Windows 10 (supporting Ubuntu and Bash) would be your candidate OS?

Saisai Shao added a comment - 12/Jul/16 13:34

Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.

Can you please elaborate this? I'm not sure why files will be deleted, from my understanding Spark Streaming will not delete processed files, I'm not sure what are you particularly mentioned about "those files"?

Saisai Shao added a comment - 12/Jul/16 13:34 Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop. Can you please elaborate this? I'm not sure why files will be deleted, from my understanding Spark Streaming will not delete processed files, I'm not sure what are you particularly mentioned about "those files"?

Hiren Ghinaiya added a comment - 13/Jul/16 14:04 - edited

Hello, This more looks like Hadoop setup issue on windows. You need to provide directory hosted on hadoop compatible file system (HCFS) otherwise it will not auto-detect. Follow steps at https://wiki.apache.org/hadoop/Hadoop2OnWindows while running hadoop on windows.

Instead of compiling hadoop, I used hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries. To use this hadoop version I need to use spark version that is pre-built for user provided hadoop. I set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html. Also put %HADOOP_HOME%\lib\native on PATH. Once setup, I followed steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount I need to pass hdfs:///tmp as directory path arg. Now I see spark is able to detect new file showing up in HDFS.

Hiren Ghinaiya added a comment - 13/Jul/16 14:04 - edited Hello, This more looks like Hadoop setup issue on windows. You need to provide directory hosted on hadoop compatible file system (HCFS) otherwise it will not auto-detect. Follow steps at https://wiki.apache.org/hadoop/Hadoop2OnWindows while running hadoop on windows. Instead of compiling hadoop, I used hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries . To use this hadoop version I need to use spark version that is pre-built for user provided hadoop. I set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html . Also put %HADOOP_HOME%\lib\native on PATH. Once setup, I followed steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount I need to pass hdfs:///tmp as directory path arg. Now I see spark is able to detect new file showing up in HDFS.

Dongjoon Hyun added a comment - 13/Jul/16 20:41

Sounds Great, hghina0!

Dongjoon Hyun added a comment - 13/Jul/16 20:41 Sounds Great, hghina0 !

Hyukjin Kwon added a comment - 08/Oct/17 15:01 - edited

Hi all, so, is this issue resolvable?

Hyukjin Kwon added a comment - 08/Oct/17 15:01 - edited Hi all, so, is this issue resolvable?

People

Assignee:: Unassigned

Reporter:: John-Michael Reed

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 07/Jul/16 20:34

Updated:: 12/Dec/22 18:10

Resolved:: 21/May/19 04:15