Created 10-03-2018 05:31 PM
Hi All,
I am new to Scala coding and trying to access AWS S3 bucket but it is failed. Please find the below error.
I want to read multiple files(*.gz) from S3 bucket and make a single CSV file (merge all gz files) but unable to read the data and getting exception as shown above.
Here is my code :
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object ReadS3Files {
def main(args: Array[String]) {
val spark = SparkSession.builder.master("local[*]").appName("ReadS3Files").getOrCreate()
val sc = spark.sparkContext
val conf = new SparkConf().setAppName("ReadS3Files").setMaster("local[*]")
val sqlContext = spark.sqlContextspark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "AccessKey")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SecretKey")
val df = spark.read.format("csv").option("delimiter",",").load("s3n://bucketname/201808/1034/JPR_DM2_ORG/*.gz")
df.count()
spark.stop()
}
}
Please help me on this issue.
Many thanks for your help.
Created 10-03-2018 05:51 PM
You can download the hadoop aws jar and put it in /usr/hdp/{hdp-version}/hadoop folder and pass it while running the spark shell command
./spark-shell --master yarn --jars /usr/hdp/{hdp-version}/hadoop/hadoop-aws.jar ...
You can also try passing --packages param to download the package in run time without downloading the jar before. Example shown below
./spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.3
Note: Make sure to download all the dependent packages as well.
.
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3.
.
Please "Accept" the answer if this helps
Created 10-03-2018 05:51 PM
You can download the hadoop aws jar and put it in /usr/hdp/{hdp-version}/hadoop folder and pass it while running the spark shell command
./spark-shell --master yarn --jars /usr/hdp/{hdp-version}/hadoop/hadoop-aws.jar ...
You can also try passing --packages param to download the package in run time without downloading the jar before. Example shown below
./spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.3
Note: Make sure to download all the dependent packages as well.
.
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3.
.
Please "Accept" the answer if this helps
Created 10-04-2018 03:11 AM
Hi Aditya,
Thanks for your reply. I have downloaded hadoop-aws.jar file and aws-java-sdk-1.7.4.jar also. I am using intellij and from Intellij am trying to access S3 bucket to read the data but no luck. Even, in core-site.xml also i have configured the aws key and secret key.
<configuration> <property> <name>fs.s3n.awsAccessKeyId</name> <value>......</value> </property> <property> <name>fs.s3n.awsSecretAccessKey</name> <value>......</value> </property> <property> <name>fs.s3a.awsAccessKeyId</name> <value>......</value> </property> <property> <name>fs.s3a.awsSecretAccessKey</name> <value>......</value> </property> </configuration>
Many thanks for your help.
Created 01-18-2019 12:47 AM
Hi Lakshmi,
I am having the same issue. Did you ever resolved this?
,Hi Lakshmi,
I am having the same issue, did you ever resovled this?
Created 07-22-2019 05:12 AM
Im facing same issue . did anyone resolved it? please post here how it got fixed?