Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark Scala : S3native.NativeS3Filesystem Not found

avatar
New Member

Hi All,

I am new to Scala coding and trying to access AWS S3 bucket but it is failed. Please find the below error.

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found

I want to read multiple files(*.gz) from S3 bucket and make a single CSV file (merge all gz files) but unable to read the data and getting exception as shown above.

Here is my code :


import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

object ReadS3Files {
def main(args: Array[String]) {
val spark = SparkSession.builder.master("local[*]").appName("ReadS3Files").getOrCreate()
val sc = spark.sparkContext
val conf = new SparkConf().setAppName("ReadS3Files").setMaster("local[*]")
val sqlContext = spark.sqlContextspark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "AccessKey")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SecretKey")

val df = spark.read.format("csv").option("delimiter",",").load("s3n://bucketname/201808/1034/JPR_DM2_ORG/*.gz")
df.count()

spark.stop()
}
}

Please help me on this issue.

Many thanks for your help.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Lakshmi Prathyusha,

You can download the hadoop aws jar and put it in /usr/hdp/{hdp-version}/hadoop folder and pass it while running the spark shell command

./spark-shell --master yarn --jars /usr/hdp/{hdp-version}/hadoop/hadoop-aws.jar ...

You can also try passing --packages param to download the package in run time without downloading the jar before. Example shown below

./spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.3

Note: Make sure to download all the dependent packages as well.

.

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3.

.

Please "Accept" the answer if this helps

View solution in original post

4 REPLIES 4

avatar
Super Guru

@Lakshmi Prathyusha,

You can download the hadoop aws jar and put it in /usr/hdp/{hdp-version}/hadoop folder and pass it while running the spark shell command

./spark-shell --master yarn --jars /usr/hdp/{hdp-version}/hadoop/hadoop-aws.jar ...

You can also try passing --packages param to download the package in run time without downloading the jar before. Example shown below

./spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.3

Note: Make sure to download all the dependent packages as well.

.

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3.

.

Please "Accept" the answer if this helps

avatar
New Member

Hi Aditya,

Thanks for your reply. I have downloaded hadoop-aws.jar file and aws-java-sdk-1.7.4.jar also. I am using intellij and from Intellij am trying to access S3 bucket to read the data but no luck. Even, in core-site.xml also i have configured the aws key and secret key.


<configuration>
<property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>......</value>
  </property>


  <property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>......</value>
  </property>
  <property>
    <name>fs.s3a.awsAccessKeyId</name>
    <value>......</value>
  </property>


  <property>
    <name>fs.s3a.awsSecretAccessKey</name>
    <value>......</value>
  </property>
</configuration>

Many thanks for your help.

avatar
New Member

Hi Lakshmi,

I am having the same issue. Did you ever resolved this?

,

Hi Lakshmi,
I am having the same issue, did you ever resovled this?

avatar
New Member

Im facing same issue . did anyone resolved it? please post here how it got fixed?