Created 09-29-2017 10:39 AM
Hi,
I am looking to use S3 instead of HDFS to store data and run computation on my Cluster (build on EC2 instances) .
Questions:
1. How can I configure HDP to use AWS IAM role to interact with S3? I dont want to use AWS keys to interact with S3
2. Is S3 Guard available in HDP2.6?
3. Any best practices to follow when using S3 instead of HDFS?
Thanks,
Pradeep
Created 09-29-2017 11:37 AM
If you would like to use Cloudbreak to provision your HDP stack then here are the answers:
1. http://hortonworks.github.io/cloudbreak-docs/latest/aws/#advanced-options -> Instance Profile option
2. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-guard.html this is in TP in HDP 2.6
3. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-get-started... a documentation regarding S3 connector
Br,
R
Created 09-29-2017 12:30 PM
@rdoktorics Thanks for the link.
But I am looking to install HDP manually (Not via Cloudbreak or HDC).
Created 09-29-2017 12:33 PM
You have to attach an AWS instance profile to every machine which you would like to use with s3. After that you are able to reach s3 with s3a://
Br,
R
Created 09-29-2017 01:15 PM
@rdoktorics I have attached AWS Instance Profile to all EC2 machines and i can successfully use aws cli to list/get S3 bucket but i cannot run command hdfs dfs -ls <S3_BUCKET_PATH>
Question here is how to make hadoop use the AWS instance profile. What all properties are required to add/set in core-site.xml or hdfs-site.xml?
Created 09-29-2017 12:34 PM
I see way to do in Hadoop 2.8 https://hadoop.apache.org/docs/r2.8.0/hadoop-aws/tools/hadoop-aws/index.html#S3A but HDP ships Hadoop 2.7.x
Created 09-29-2017 01:24 PM
I added following property and able to list S3 bucket from hdfs comamnd
fs.s3a.aws.credentials.provider=com.amazonaws.auth.InstanceProfileCredentialsProvider
But not sure if this property is sufficient when interacting with S3.
Created 09-29-2017 01:44 PM
In this documentation: https://hortonworks.github.io/hdp-aws/s3-configure/index.html
<property> <name>fs.s3a.aws.credentials.provider</name> <value>org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider</value> </property>