Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

HDP install in AWS VPC, on custom AMI -- feasibility question.

avatar
Explorer

Hi there, questions for the cloud/HDP experts please...

- I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below

that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

https://community.hortonworks.com/questions/75417/cloudbreak-161-deploying-clusters-with-custom-amis...

- I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

- I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

- EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

thanks!

jhals99

1 ACCEPTED SOLUTION

avatar

@jhals99

I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.

I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

Yes. In core-site.xml, you'll find the following configuration available:

spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY
spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY

I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.

EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.

We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?

Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.

View solution in original post

11 REPLIES 11

avatar

S3A on HDP 2.5 supports server side encryption.

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>AES256</value>
  <description>Specify a server-side encryption algorithm for s3a: file system.
    Unset by default, and the only other currently allowable value is AES256.
  </description>
</property>

avatar

One more thing: S3A doesn't handle R/W buckets with different ACLs attached to different parts of it, as the code expects to have write access to any R/W repo. You should restrict access by bucket, not by trying to use some kind of ACL within a bucket