Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar

First lets create a sample file in S3:

In the AWS Console , Go to S3 and create a bucket “S3Demo” and pick your region. Upload the file manually by using the upload button (example file name used later in scala: S3HDPTEST.csv)

In the HDP 2.4.0 Sandbox :

Download the aws sdk for java https://aws.amazon.com/sdk-for-java/ Uploaded it to the hadoop directory. You should see the aws-java-sdk-1.10.65.jar in /usr/hdp/2.4.0.0-169/hadoop/

[root@sandbox bin]# ll /usr/hdp/2.4.0.0-169/hadoop/
total 242692
-rw-r--r-- 1 root root  32380018 2016-03-31 22:02 aws-java-sdk-1.10.65.jar
drwxr-xr-x 2 root root      4096 2016-02-29 18:05 bin
drwxr-xr-x 2 root root     12288 2016-02-29 17:49 client
lrwxrwxrwx 1 root root        25 2016-03-31 21:08 conf -> /etc/hadoop/2.4.0.0-169/0
drwxr-xr-x 2 root root      4096 2016-02-29 17:46 etc
-rw-r--r-- 1 root root     17366 2016-02-10 06:44 hadoop-annotations-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        40 2016-02-29 17:46 hadoop-annotations.jar -> hadoop-annotations-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root     71534 2016-02-10 06:44 hadoop-auth-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        33 2016-02-29 17:46 hadoop-auth.jar -> hadoop-auth-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root    103049 2016-02-10 06:44 hadoop-aws-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        32 2016-02-29 17:46 hadoop-aws.jar -> hadoop-aws-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root    138488 2016-02-10 06:44 hadoop-azure-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        34 2016-02-29 17:46 hadoop-azure.jar -> hadoop-azure-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root   3469432 2016-02-10 06:44 hadoop-common-2.7.1.2.4.0.0-169.jar
-rw-r--r-- 1 root root   1903274 2016-02-10 06:44 hadoop-common-2.7.1.2.4.0.0-169-tests.jar
lrwxrwxrwx 1 root root        35 2016-02-29 17:46 hadoop-common.jar -> hadoop-common-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        41 2016-02-29 17:46 hadoop-common-tests.jar -> hadoop-common-2.7.1.2.4.0.0-169-tests.jar
-rw-r--r-- 1 root root    159484 2016-02-10 06:44 hadoop-nfs-2.7.1.2.4.0.0-169.jar
lrwxrwxrwx 1 root root        32 2016-02-29 17:46 hadoop-nfs.jar -> hadoop-nfs-2.7.1.2.4.0.0-169.jar
drwxr-xr-x 5 root root      4096 2016-03-31 20:27 lib
drwxr-xr-x 2 root root      4096 2016-02-29 17:46 libexec
drwxr-xr-x 3 root root      4096 2016-02-29 17:46 man
-rw-r--r-- 1 root root 210216729 2016-02-10 06:44 mapreduce.tar.gz
drwxr-xr-x 2 root root      4096 2016-02-29 17:46 sbin

Change directory to spark/bin

[root@sandbox bin]# cd /usr/hdp/2.4.0.0-169/spark/bin

Start the Spark Scala shell with right aws jars dependencies:

 ./spark-shell  --master yarn-client --jars /usr/hdp/2.4.0.0-169/hadoop/hadoop-aws-2.7.1.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hadoop/hadoop-auth.jar,/usr/hdp/2.4.0.0-169/hadoop/aws-java-sdk-1.10.65.jar --driver-memory 512m --executor-memory 512m

Now for some scala code to configure the aws secret keys in hadoopConf

val hadoopConf = sc.hadoopConfiguration;

hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", "xxxxxxx")
hadoopConf.set("fs.s3.awsSecretAccessKey", "xxxxxxx")

and now read the file from s3 bucket

val myLines = sc.textFile("s3n://s3hdptest/S3HDPTEST.csv");
myLines.count();
print count;
33,291 Views
Comments
avatar
New Contributor

Hi,

I follow exactly the same instructions used in this tutorial and when I execute the Spark Scala shell :

./spark-shell --master yarn-client --jars /usr/hdp/2.4.0.0-169/hadoop/hadoop-aws-2.7.1.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hadoop/hadoop-auth.jar,/usr/hdp/2.4.0.0-169/hadoop/aws-java-sdk-1.10.65.jar --driver-memory 512m --executor-memory 512m

I get this exception :

ERROR SparkContext: Error initializing SparkContext. java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated

.......

Caused by: java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) at java.lang.Class.getConstructor0(Class.java:3075) at java.lang.Class.newInstance(Class.java:412) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)

Not that all jars included in the spark-shell command exist.

Any Idea about this error ?

Thank you for you help

avatar
New Contributor

Hi, require one more JAR file guava-19.0.jar - you should download it and add to jars path.

avatar
New Contributor

Hi, looks like simple error: I see s3a in your exception, but I think s3 or s3n should be there.