Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDP install in AWS VPC, on custom AMI -- feasibility question.

avatar
Explorer

Hi there, questions for the cloud/HDP experts please...

- I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below

that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

https://community.hortonworks.com/questions/75417/cloudbreak-161-deploying-clusters-with-custom-amis...

- I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

- I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

- EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

thanks!

jhals99

1 ACCEPTED SOLUTION

avatar

@jhals99

I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.

I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

Yes. In core-site.xml, you'll find the following configuration available:

spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY
spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY

I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.

EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.

We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?

Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.

View solution in original post

11 REPLIES 11

avatar

S3A on HDP 2.5 supports server side encryption.

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>AES256</value>
  <description>Specify a server-side encryption algorithm for s3a: file system.
    Unset by default, and the only other currently allowable value is AES256.
  </description>
</property>

avatar

One more thing: S3A doesn't handle R/W buckets with different ACLs attached to different parts of it, as the code expects to have write access to any R/W repo. You should restrict access by bucket, not by trying to use some kind of ACL within a bucket