I am looking to use S3 instead of HDFS to store data and run computation on my Cluster (build on EC2 instances) .
1. How can I configure HDP to use AWS IAM role to interact with S3? I dont want to use AWS keys to interact with S3
2. Is S3 Guard available in HDP2.6?
3. Any best practices to follow when using S3 instead of HDFS?
If you would like to use Cloudbreak to provision your HDP stack then here are the answers:
1. http://hortonworks.github.io/cloudbreak-docs/latest/aws/#advanced-options -> Instance Profile option
2. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-guard.html this is in TP in HDP 2.6
3. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-get-started... a documentation regarding S3 connector
@rdoktorics I have attached AWS Instance Profile to all EC2 machines and i can successfully use aws cli to list/get S3 bucket but i cannot run command hdfs dfs -ls <S3_BUCKET_PATH>
Question here is how to make hadoop use the AWS instance profile. What all properties are required to add/set in core-site.xml or hdfs-site.xml?
I added following property and able to list S3 bucket from hdfs comamnd
But not sure if this property is sufficient when interacting with S3.
In this documentation: https://hortonworks.github.io/hdp-aws/s3-configure/index.html
<property> <name>fs.s3a.aws.credentials.provider</name> <value>org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider</value> </property>