Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to configure HDP2.6 to use S3?

Expert Contributor

Hi,

I am looking to use S3 instead of HDFS to store data and run computation on my Cluster (build on EC2 instances) .

Questions:

1. How can I configure HDP to use AWS IAM role to interact with S3? I dont want to use AWS keys to interact with S3

2. Is S3 Guard available in HDP2.6?

3. Any best practices to follow when using S3 instead of HDFS?

Thanks,

Pradeep

7 REPLIES 7

Expert Contributor
@Pradeep Bhadani

If you would like to use Cloudbreak to provision your HDP stack then here are the answers:

1. http://hortonworks.github.io/cloudbreak-docs/latest/aws/#advanced-options -> Instance Profile option

2. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-guard.html this is in TP in HDP 2.6

3. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-get-started... a documentation regarding S3 connector

Br,

R

Expert Contributor

@rdoktorics Thanks for the link.

But I am looking to install HDP manually (Not via Cloudbreak or HDC).

Expert Contributor

@Pradeep Bhadani

You have to attach an AWS instance profile to every machine which you would like to use with s3. After that you are able to reach s3 with s3a://

Br,

R

Expert Contributor

@rdoktorics I have attached AWS Instance Profile to all EC2 machines and i can successfully use aws cli to list/get S3 bucket but i cannot run command hdfs dfs -ls <S3_BUCKET_PATH>

Question here is how to make hadoop use the AWS instance profile. What all properties are required to add/set in core-site.xml or hdfs-site.xml?

Expert Contributor

I see way to do in Hadoop 2.8 https://hadoop.apache.org/docs/r2.8.0/hadoop-aws/tools/hadoop-aws/index.html#S3A but HDP ships Hadoop 2.7.x

Expert Contributor

I added following property and able to list S3 bucket from hdfs comamnd

fs.s3a.aws.credentials.provider=com.amazonaws.auth.InstanceProfileCredentialsProvider

But not sure if this property is sufficient when interacting with S3.

Expert Contributor

In this documentation: https://hortonworks.github.io/hdp-aws/s3-configure/index.html

<property>
	<name>fs.s3a.aws.credentials.provider</name>
	<value>org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider</value>
</property>
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.