Support Questions

klprathyusha · ‎10-03-2018

Hi All,

I am new to Scala coding and trying to access AWS S3 bucket but it is failed. Please find the below error.

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found

I want to read multiple files(*.gz) from S3 bucket and make a single CSV file (merge all gz files) but unable to read the data and getting exception as shown above.

Here is my code :


import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

object ReadS3Files {
  def main(args: Array[String]) {
    val spark = SparkSession.builder.master("local[*]").appName("ReadS3Files").getOrCreate()
    val sc = spark.sparkContext
    val conf = new SparkConf().setAppName("ReadS3Files").setMaster("local[*]")
    val sqlContext = spark.sqlContextspark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
    spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "AccessKey")
    spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SecretKey")

    val df = spark.read.format("csv").option("delimiter",",").load("s3n://bucketname/201808/1034/JPR_DM2_ORG/*.gz")
    df.count()

    spark.stop()
  }
}

Please help me on this issue.

Many thanks for your help.

asirna · ‎10-03-2018

@Lakshmi Prathyusha,

You can download the hadoop aws jar and put it in /usr/hdp/{hdp-version}/hadoop folder and pass it while running the spark shell command

./spark-shell --master yarn --jars /usr/hdp/{hdp-version}/hadoop/hadoop-aws.jar ...

You can also try passing --packages param to download the package in run time without downloading the jar before. Example shown below

./spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.3

Note: Make sure to download all the dependent packages as well.

.

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3.

.

Please "Accept" the answer if this helps

View solution in original post

asirna · ‎10-03-2018

@Lakshmi Prathyusha,

You can download the hadoop aws jar and put it in /usr/hdp/{hdp-version}/hadoop folder and pass it while running the spark shell command

./spark-shell --master yarn --jars /usr/hdp/{hdp-version}/hadoop/hadoop-aws.jar ...

You can also try passing --packages param to download the package in run time without downloading the jar before. Example shown below

./spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.3

Note: Make sure to download all the dependent packages as well.

.

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3.

.

Please "Accept" the answer if this helps

klprathyusha · ‎10-04-2018

Hi Aditya,

Thanks for your reply. I have downloaded hadoop-aws.jar file and aws-java-sdk-1.7.4.jar also. I am using intellij and from Intellij am trying to access S3 bucket to read the data but no luck. Even, in core-site.xml also i have configured the aws key and secret key.


<configuration>
<property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>......</value>
  </property>


  <property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>......</value>
  </property>
  <property>
    <name>fs.s3a.awsAccessKeyId</name>
    <value>......</value>
  </property>


  <property>
    <name>fs.s3a.awsSecretAccessKey</name>
    <value>......</value>
  </property>
</configuration>

Many thanks for your help.

derekyang742 · ‎01-18-2019

Hi Lakshmi,

I am having the same issue. Did you ever resolved this?

,

Hi Lakshmi,
I am having the same issue, did you ever resovled this?

prakash_r_a · ‎07-22-2019

Im facing same issue . did anyone resolved it? please post here how it got fixed?

Cloudera Community

Support Questions

Spark Scala : S3native.NativeS3Filesystem Not found

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found