Created on 04-01-2016 02:21 PM
First lets create a sample file in S3:
In the AWS Console , Go to S3 and create a bucket “S3Demo” and pick your region. Upload the file manually by using the upload button (example file name used later in scala: S3HDPTEST.csv)
In the HDP 2.4.0 Sandbox :
Download the aws sdk for java https://aws.amazon.com/sdk-for-java/ Uploaded it to the hadoop directory. You should see the aws-java-sdk-1.10.65.jar in /usr/hdp/2.4.0.0-169/hadoop/
[root@sandbox bin]# ll /usr/hdp/2.4.0.0-169/hadoop/ total 242692 -rw-r--r-- 1 root root 32380018 2016-03-31 22:02 aws-java-sdk-1.10.65.jar drwxr-xr-x 2 root root 4096 2016-02-29 18:05 bin drwxr-xr-x 2 root root 12288 2016-02-29 17:49 client lrwxrwxrwx 1 root root 25 2016-03-31 21:08 conf -> /etc/hadoop/2.4.0.0-169/0 drwxr-xr-x 2 root root 4096 2016-02-29 17:46 etc -rw-r--r-- 1 root root 17366 2016-02-10 06:44 hadoop-annotations-2.7.1.2.4.0.0-169.jar lrwxrwxrwx 1 root root 40 2016-02-29 17:46 hadoop-annotations.jar -> hadoop-annotations-2.7.1.2.4.0.0-169.jar -rw-r--r-- 1 root root 71534 2016-02-10 06:44 hadoop-auth-2.7.1.2.4.0.0-169.jar lrwxrwxrwx 1 root root 33 2016-02-29 17:46 hadoop-auth.jar -> hadoop-auth-2.7.1.2.4.0.0-169.jar -rw-r--r-- 1 root root 103049 2016-02-10 06:44 hadoop-aws-2.7.1.2.4.0.0-169.jar lrwxrwxrwx 1 root root 32 2016-02-29 17:46 hadoop-aws.jar -> hadoop-aws-2.7.1.2.4.0.0-169.jar -rw-r--r-- 1 root root 138488 2016-02-10 06:44 hadoop-azure-2.7.1.2.4.0.0-169.jar lrwxrwxrwx 1 root root 34 2016-02-29 17:46 hadoop-azure.jar -> hadoop-azure-2.7.1.2.4.0.0-169.jar -rw-r--r-- 1 root root 3469432 2016-02-10 06:44 hadoop-common-2.7.1.2.4.0.0-169.jar -rw-r--r-- 1 root root 1903274 2016-02-10 06:44 hadoop-common-2.7.1.2.4.0.0-169-tests.jar lrwxrwxrwx 1 root root 35 2016-02-29 17:46 hadoop-common.jar -> hadoop-common-2.7.1.2.4.0.0-169.jar lrwxrwxrwx 1 root root 41 2016-02-29 17:46 hadoop-common-tests.jar -> hadoop-common-2.7.1.2.4.0.0-169-tests.jar -rw-r--r-- 1 root root 159484 2016-02-10 06:44 hadoop-nfs-2.7.1.2.4.0.0-169.jar lrwxrwxrwx 1 root root 32 2016-02-29 17:46 hadoop-nfs.jar -> hadoop-nfs-2.7.1.2.4.0.0-169.jar drwxr-xr-x 5 root root 4096 2016-03-31 20:27 lib drwxr-xr-x 2 root root 4096 2016-02-29 17:46 libexec drwxr-xr-x 3 root root 4096 2016-02-29 17:46 man -rw-r--r-- 1 root root 210216729 2016-02-10 06:44 mapreduce.tar.gz drwxr-xr-x 2 root root 4096 2016-02-29 17:46 sbin
Change directory to spark/bin
[root@sandbox bin]# cd /usr/hdp/2.4.0.0-169/spark/bin
Start the Spark Scala shell with right aws jars dependencies:
./spark-shell --master yarn-client --jars /usr/hdp/2.4.0.0-169/hadoop/hadoop-aws-2.7.1.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hadoop/hadoop-auth.jar,/usr/hdp/2.4.0.0-169/hadoop/aws-java-sdk-1.10.65.jar --driver-memory 512m --executor-memory 512m
Now for some scala code to configure the aws secret keys in hadoopConf
val hadoopConf = sc.hadoopConfiguration; hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") hadoopConf.set("fs.s3.awsAccessKeyId", "xxxxxxx") hadoopConf.set("fs.s3.awsSecretAccessKey", "xxxxxxx")
and now read the file from s3 bucket
val myLines = sc.textFile("s3n://s3hdptest/S3HDPTEST.csv"); myLines.count(); print count;
Created on 04-27-2016 05:16 PM
Hi,
I follow exactly the same instructions used in this tutorial and when I execute the Spark Scala shell :
./spark-shell --master yarn-client --jars /usr/hdp/2.4.0.0-169/hadoop/hadoop-aws-2.7.1.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hadoop/hadoop-auth.jar,/usr/hdp/2.4.0.0-169/hadoop/aws-java-sdk-1.10.65.jar --driver-memory 512m --executor-memory 512m
I get this exception :
ERROR SparkContext: Error initializing SparkContext. java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
.......
Caused by: java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) at java.lang.Class.getConstructor0(Class.java:3075) at java.lang.Class.newInstance(Class.java:412) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
Not that all jars included in the spark-shell command exist.
Any Idea about this error ?
Thank you for you help
Created on 09-06-2016 12:35 PM
Hi, require one more JAR file guava-19.0.jar - you should download it and add to jars path.
Created on 09-06-2016 12:36 PM
Hi, looks like simple error: I see s3a in your exception, but I think s3 or s3n should be there.