Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Scala : S3native.NativeS3Filesystem Not found

avatar

Hi All,

I am new to Scala coding and trying to access AWS S3 bucket but it is failed. Please find the below error.

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found

I want to read multiple files(*.gz) from S3 bucket and make a single CSV file (merge all gz files) but unable to read the data and getting exception as shown above.

Here is my code :


import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

object ReadS3Files {
def main(args: Array[String]) {
val spark = SparkSession.builder.master("local[*]").appName("ReadS3Files").getOrCreate()
val sc = spark.sparkContext
val conf = new SparkConf().setAppName("ReadS3Files").setMaster("local[*]")
val sqlContext = spark.sqlContextspark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "AccessKey")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SecretKey")

val df = spark.read.format("csv").option("delimiter",",").load("s3n://bucketname/201808/1034/JPR_DM2_ORG/*.gz")
df.count()

spark.stop()
}
}

Please help me on this issue.

Many thanks for your help.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Lakshmi Prathyusha,

You can download the hadoop aws jar and put it in /usr/hdp/{hdp-version}/hadoop folder and pass it while running the spark shell command

./spark-shell --master yarn --jars /usr/hdp/{hdp-version}/hadoop/hadoop-aws.jar ...

You can also try passing --packages param to download the package in run time without downloading the jar before. Example shown below

./spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.3

Note: Make sure to download all the dependent packages as well.

.

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3.

.

Please "Accept" the answer if this helps

View solution in original post

4 REPLIES 4

avatar
Super Guru

@Lakshmi Prathyusha,

You can download the hadoop aws jar and put it in /usr/hdp/{hdp-version}/hadoop folder and pass it while running the spark shell command

./spark-shell --master yarn --jars /usr/hdp/{hdp-version}/hadoop/hadoop-aws.jar ...

You can also try passing --packages param to download the package in run time without downloading the jar before. Example shown below

./spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.3

Note: Make sure to download all the dependent packages as well.

.

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3.

.

Please "Accept" the answer if this helps

avatar

Hi Aditya,

Thanks for your reply. I have downloaded hadoop-aws.jar file and aws-java-sdk-1.7.4.jar also. I am using intellij and from Intellij am trying to access S3 bucket to read the data but no luck. Even, in core-site.xml also i have configured the aws key and secret key.


<configuration>
<property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>......</value>
  </property>


  <property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>......</value>
  </property>
  <property>
    <name>fs.s3a.awsAccessKeyId</name>
    <value>......</value>
  </property>


  <property>
    <name>fs.s3a.awsSecretAccessKey</name>
    <value>......</value>
  </property>
</configuration>

Many thanks for your help.

avatar
New Contributor

Hi Lakshmi,

I am having the same issue. Did you ever resolved this?

,

Hi Lakshmi,
I am having the same issue, did you ever resovled this?

avatar
New Contributor

Im facing same issue . did anyone resolved it? please post here how it got fixed?