Support Questions
Find answers, ask questions, and share your expertise

Spark on S3

Solved Go to solution

Spark on S3

Rising Star

Unable to execute the queries on S3 data using SPARK and PYSPARK. It is throwing below error.

: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)

at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)

….

….

Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)

at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)

we have tried it by adding below parameters but no luck.

Parameter name: fs.s3a.impl

Parameter value: org.apache.hadoop.fs.s3a.S3AFileSystem

Added this paramter in hdfs.site.xml, core-site.xml, hive-site.xml and also added the aws jar files in mapred-site.xml (added to classpath)files.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark on S3

Guru
6 REPLIES 6

Re: Spark on S3

Re: Spark on S3

Hi @Kirk Haslbeck,

don't know which version you are using but if you didn't see then take a look at below Jira it might help.

https://issues.apache.org/jira/browse/SPARK-7442

Re: Spark on S3

Guru

Re: Spark on S3

Mentor

yep, S3A implementation is not complete yet, try using S3N for now or follow Alex's article referenced below.

Re: Spark on S3

Rising Star

Thanks all @Artem Ervits @Tom McCuch for the comments. I did get it resolved by passing all the S3 jars properly on the classpath. The articles included in your threads helped.

Re: Spark on S3

@Kirk Haslbeck - I was working on something similar. Writing PySpark to use SparkSQL to analyze data in S3 using the S3A filesystem client. I documented my work with instructions here:

https://community.hortonworks.com/articles/36339/spark-s3a-filesystem-client-from-hdp-to-access-s3.h...