Step 1 : Log into AWS your credentials

Step 2 : From the AWS console go to the following options and create a user in for the demo in AWS

Security & Identity --> Identity and Access Management --> Users --> Create New Users


Step 3 : Make note of the credentials

awsAccessKeyId = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx';

awsSecretAccessKey = 'yyyyyyyyyyyyyyyyyyyyyyyyyyy';

Step 4 : Add the User to the Admin Group by clicking the button “User Actions” and select the option Add Users to Group and add select your user (admin)

Step 5 : Assign the Administration Access Policy to the User (admin)

Step 6 : In the AWS Console , Go to S3 and create a bucket “s3hdptest” and pick your region


Step 7 : Upload the file manually by using the upload button. In our example we are uploading the file S3HDPTEST.csv


Step 8 : In the Hadoop Environment create the user with the same name as it is created in the S3 Environment

Step 9 : In Ambari do all the below properties in both hdfs-site.xml and hive-site.xml

  <description>AWS access key ID. Omit for Role-based authentication.</description>
  <description>AWS secret key. Omit for Role-based authentication.</description>

Step 10 : Restart the Hadoop Services like HDFS , Hive and any depending services

Step 11 : Ensure the NTP is set to the properly to reflect the AWS timestamp, follow the steps in the below link

Step 12 : Run the below statement from the command line to test whether we are able to view the file from S3

[root@sandbox ~]# su admin
bash-4.1$ hdfs dfs -ls s3a://s3hdptest/S3HDPTEST.csv
-rw-rw-rw- 1 188 2016-03-29 22:12 s3a://s3hdptest/S3HDPTEST.csv

Step 13: To verify the data you can use the below command

bash-4.1$ hdfs dfs -cat s3a://s3hdptest/S3HDPTEST.csv

Step 14 : Move a file from S3 to HDFS

bash-4.1$ hadoop fs -cp s3a://s3hdptest/S3HDPTEST.csv /user/admin/S3HDPTEST.csv

Step 15 : Move a file from HDFS to S3

bash-4.1$ hadoop fs -cp /user/admin/S3HDPTEST.csv s3a://s3hdptest/S3HDPTEST_1.csv

Step 15a : Verify whether the file has been stored in the AWS S3 Bucket


Step 16 : To access the data using Hive from S3:

Connect to Hive from Ambari using the Hive Views or Hive CLI

A) Create a table for the datafile in S3

(FirstName STRING, LastName STRING, StreetAddress STRING, City STRING, State STRING,ZipCode INT)
LOCATION 's3a://s3hdptest/';

B) Select the file data from Hive

hive> SELECT * FROM mydata;

Step 17 : To Access the data using Pig from S3:

[root@sandbox ~]# pig -x tez

grunt> a = load 's3a://s3hdptest/S3HDPTEST.csv' using PigStorage();
grunt> dump a;

Step 18 : To Store the data using Pig to S3:

grunt> store a into 's3a://s3hdptest/OUTPUT' using PigStorage();

Checking the created data file in AWS S3 bucket


Note: For the article related to accessing AWS S3 Bucket using Spark please refer to the below link:


