Community Articles

azeltov · ‎04-01-2016

First lets create a sample file in S3:

In the AWS Console , Go to S3 and create a bucket “S3Demo” and pick your region. Upload the file manually by using the upload button (example file name used later in scala: S3HDPTEST.csv)

In the HDP 2.4.0 Sandbox :

Download the aws sdk for java https://aws.amazon.com/sdk-for-java/ Uploaded it to the hadoop directory. You should see the aws-java-sdk-1.10.65.jar in /usr/hdp/2.4.0.0-169/hadoop/

[root@sandbox bin]# ll /usr/hdp/2.4.0.0-169/hadoop/
total 242692
-rw-r--r-- 1 root root  32380018 2016-03-31 22:02 aws-java-sdk-1.10.65.jar
drwxr-xr-x 2 root root      4096 2016-02-29 18:05 bin
drwxr-xr-x 2 root root     12288 2016-02-29 17:49 client
lrwxrwxrwx 1 root root        25 2016-03-31 21:08 conf -> /etc/hadoop/2.4.0.0-169/0
drwxr-xr-x 2 root root      4096 2016-02-29 17:46 etc
-rw-r--r-- 1 root root     17366 2016-02-10 06:44 hadoop-annotations-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        40 2016-02-29 17:46 hadoop-annotations.jar -> hadoop-annotations-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root     71534 2016-02-10 06:44 hadoop-auth-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        33 2016-02-29 17:46 hadoop-auth.jar -> hadoop-auth-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root    103049 2016-02-10 06:44 hadoop-aws-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        32 2016-02-29 17:46 hadoop-aws.jar -> hadoop-aws-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root    138488 2016-02-10 06:44 hadoop-azure-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        34 2016-02-29 17:46 hadoop-azure.jar -> hadoop-azure-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root   3469432 2016-02-10 06:44 hadoop-common-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root   1903274 2016-02-10 06:44 hadoop-common-2.7.1.2.4.0.0-169-tests.jar
lrwxrwxrwx 1 root root        35 2016-02-29 17:46 hadoop-common.jar -> hadoop-common-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        41 2016-02-29 17:46 hadoop-common-tests.jar -> hadoop-common-2.7.1.2.4.0.0-169-tests.jar
-rw-r--r-- 1 root root    159484 2016-02-10 06:44 hadoop-nfs-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        32 2016-02-29 17:46 hadoop-nfs.jar -> hadoop-nfs-2.7.1.2.4.0.0-169.jar
drwxr-xr-x 5 root root      4096 2016-03-31 20:27 lib
drwxr-xr-x 2 root root      4096 2016-02-29 17:46 libexec
drwxr-xr-x 3 root root      4096 2016-02-29 17:46 man
-rw-r--r-- 1 root root 210216729 2016-02-10 06:44 mapreduce.tar.gz
drwxr-xr-x 2 root root      4096 2016-02-29 17:46 sbin

Change directory to spark/bin

[root@sandbox bin]# cd /usr/hdp/2.4.0.0-169/spark/bin

Start the Spark Scala shell with right aws jars dependencies:

 ./spark-shell  --master yarn-client --jars /usr/hdp/2.4.0.0-169/hadoop/hadoop-aws-2.7.1.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hadoop/hadoop-auth.jar,/usr/hdp/2.4.0.0-169/hadoop/aws-java-sdk-1.10.65.jar --driver-memory 512m --executor-memory 512m

Now for some scala code to configure the aws secret keys in hadoopConf

val hadoopConf = sc.hadoopConfiguration;

hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", "xxxxxxx")
hadoopConf.set("fs.s3.awsSecretAccessKey", "xxxxxxx")

and now read the file from s3 bucket

val myLines = sc.textFile("s3n://s3hdptest/S3HDPTEST.csv");
myLines.count();
print count;

c_ · ‎04-27-2016

Hi,

I follow exactly the same instructions used in this tutorial and when I execute the Spark Scala shell :

./spark-shell --master yarn-client --jars /usr/hdp/2.4.0.0-169/hadoop/hadoop-aws-2.7.1.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hadoop/hadoop-auth.jar,/usr/hdp/2.4.0.0-169/hadoop/aws-java-sdk-1.10.65.jar --driver-memory 512m --executor-memory 512m

I get this exception :

ERROR SparkContext: Error initializing SparkContext. java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated

.......

Caused by: java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) at java.lang.Class.getConstructor0(Class.java:3075) at java.lang.Class.newInstance(Class.java:412) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)

Not that all jars included in the spark-shell command exist.

Any Idea about this error ?

Thank you for you help

wojtekk · ‎09-06-2016

Hi, require one more JAR file guava-19.0.jar - you should download it and add to jars path.

wojtekk · ‎09-06-2016

Hi, looks like simple error: I see s3a in your exception, but I think s3 or s3n should be there.

Cloudera Community

Community Articles

HDP 2.4.0 and Spark 1.6.0 connecting to AWS S3 buckets

Apache Spark

Re: HDP 2.4.0 and Spark 1.6.0 connecting to AWS S3 buckets

Re: HDP 2.4.0 and Spark 1.6.0 connecting to AWS S3 buckets

Re: HDP 2.4.0 and Spark 1.6.0 connecting to AWS S3 buckets

Spark + S3A filesystem client from HDP to access S...

Listing AWS S3 buckets

Delete File inside AWS S3 bucket path

External AWS Bucket Access in CDP Public Cloud

Deploy HDP Sandbox on AWS

HDP-AWS (Hortonworks-AWS)

Fetch objects from an IBM Cloud S3 bucket using Ap...

Overview of Cloudbreak 2.4.0

Spark Python Integration Test Result Exceptions

AWS S3 bucket as a primary storage for HDFS