Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.6.1
-
None
-
None
-
Debian GNU/Linux 8
java version "1.7.0_79"
Description
The spark programming-guide explain that Spark can create distributed datasets on Amazon S3 .
But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or s3a.
sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", "xxxxxxxxxxxxxxxxxxxxxxxxxxx")
val lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with hadoop.7.2 .
I understand this is an Hadoop Issue (SPARK-7442) but can you make some documentation to explain what jar we need to add and where ? ( for standalone installation) .
"hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ?
What env variable we need to set and what file we need to modifiy .
Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable "spark.driver.extraClassPath" and "spark.executor.extraClassPath"
But Still Works with spark-1.6.1 pre build with hadoop2.4
Thanks
Attachments
Issue Links
- depends upon
-
SPARK-7481 Add spark-hadoop-cloud module to pull in object store support
- Resolved
- duplicates
-
SPARK-7481 Add spark-hadoop-cloud module to pull in object store support
- Resolved