[SPARK-15965] No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.6.1
Fix Version/s: None
Component/s: Build
Labels:
None
Environment:

Debian GNU/Linux 8
java version "1.7.0_79"

Description

The spark programming-guide explain that Spark can create distributed datasets on Amazon S3 .
But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or s3a.

sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", "xxxxxxxxxxxxxxxxxxxxxxxxxxx")
val lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with hadoop.7.2 .
I understand this is an Hadoop Issue (~~SPARK-7442~~) but can you make some documentation to explain what jar we need to add and where ? ( for standalone installation) .
"hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ?
What env variable we need to set and what file we need to modifiy .
Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable "spark.driver.extraClassPath" and "spark.executor.extraClassPath"

But Still Works with spark-1.6.1 pre build with hadoop2.4

Thanks

Attachments

Issue Links

depends upon

SPARK-7481 Add spark-hadoop-cloud module to pull in object store support

Resolved

duplicates

SPARK-7481 Add spark-hadoop-cloud module to pull in object store support

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: thauvin damien

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Jun/16 15:30

Updated:: 25/Aug/16 08:59

Resolved:: 25/Aug/16 08:59

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified