Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to configure HDP2.6 to use S3?

Highlighted

How to configure HDP2.6 to use S3?

Expert Contributor

Hi,

I am looking to use S3 instead of HDFS to store data and run computation on my Cluster (build on EC2 instances) .

Questions:

1. How can I configure HDP to use AWS IAM role to interact with S3? I dont want to use AWS keys to interact with S3

2. Is S3 Guard available in HDP2.6?

3. Any best practices to follow when using S3 instead of HDFS?

Thanks,

Pradeep

7 REPLIES 7
Highlighted

Re: How to configure HDP2.6 to use S3?

Expert Contributor
@Pradeep Bhadani

If you would like to use Cloudbreak to provision your HDP stack then here are the answers:

1. http://hortonworks.github.io/cloudbreak-docs/latest/aws/#advanced-options -> Instance Profile option

2. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-guard.html this is in TP in HDP 2.6

3. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-get-started... a documentation regarding S3 connector

Br,

R

Highlighted

Re: How to configure HDP2.6 to use S3?

Expert Contributor

@rdoktorics Thanks for the link.

But I am looking to install HDP manually (Not via Cloudbreak or HDC).

Re: How to configure HDP2.6 to use S3?

Expert Contributor

@Pradeep Bhadani

You have to attach an AWS instance profile to every machine which you would like to use with s3. After that you are able to reach s3 with s3a://

Br,

R

Highlighted

Re: How to configure HDP2.6 to use S3?

Expert Contributor

@rdoktorics I have attached AWS Instance Profile to all EC2 machines and i can successfully use aws cli to list/get S3 bucket but i cannot run command hdfs dfs -ls <S3_BUCKET_PATH>

Question here is how to make hadoop use the AWS instance profile. What all properties are required to add/set in core-site.xml or hdfs-site.xml?

Highlighted

Re: How to configure HDP2.6 to use S3?

Expert Contributor

I see way to do in Hadoop 2.8 https://hadoop.apache.org/docs/r2.8.0/hadoop-aws/tools/hadoop-aws/index.html#S3A but HDP ships Hadoop 2.7.x

Highlighted

Re: How to configure HDP2.6 to use S3?

Expert Contributor

I added following property and able to list S3 bucket from hdfs comamnd

fs.s3a.aws.credentials.provider=com.amazonaws.auth.InstanceProfileCredentialsProvider

But not sure if this property is sufficient when interacting with S3.

Highlighted

Re: How to configure HDP2.6 to use S3?

Expert Contributor

In this documentation: https://hortonworks.github.io/hdp-aws/s3-configure/index.html

<property>
	<name>fs.s3a.aws.credentials.provider</name>
	<value>org.apache.hadoop.fs.s3a.SharedInstanceProfileCredentialsProvider</value>
</property>
Don't have an account?
Coming from Hortonworks? Activate your account here