Support Questions
Find answers, ask questions, and share your expertise

HDP install in AWS VPC, on custom AMI -- feasibility question.

New Contributor

Hi there, questions for the cloud/HDP experts please...

- I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below

that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

https://community.hortonworks.com/questions/75417/cloudbreak-161-deploying-clusters-with-custom-amis...

- I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

- I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

- EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

thanks!

jhals99

1 ACCEPTED SOLUTION

Accepted Solutions

@jhals99

I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.

I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

Yes. In core-site.xml, you'll find the following configuration available:

spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY
spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY

I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.

EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.

We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?

Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.

View solution in original post

11 REPLIES 11

New Contributor

Edit: Additional Info: We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?

Rising Star

Hi @jhals99,

I do not know which level you need exactly, but if you create a support ticket or get in touch with the Solution Engineer(SE) on the project should solve this question.

LUKS encryption should work but we haven't tried it, so I am not 100% sure about that.

As I know we haven't test S3A from this approach. I think you should create a question in the Hadoop and or HDFS topics. Maybe the JIRA-s in the S3A docs could be a good start for searching the right contact for these questions.

Br,

Tamas

New Contributor

Thanks!! @Tamas Bihari for the pointers. Just wanted clarify: when you say "As I know we haven't test S3A from this approach" -- are you responding to my question about hbase/s3?

thanks, jhals99

Contributor

EMR is doing HBase on S3 with WAL stored on local HDFS - a better approach is to do WAL on EBS or EFS. WAL on local HDFS is not really recommended as you can loose data if the node goes away.

New Contributor

Thanks! @Janos Matyas -- understood about hbase wal.

@jhals99

I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.

I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

Yes. In core-site.xml, you'll find the following configuration available:

spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY
spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY

I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.

EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.

We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?

Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.

View solution in original post

New Contributor

thanks!! @Tom McCuch and other Hortonworks folks for the answers.

@jhals99

Likewise for the great question! If you could accept my answer, I'd be very appreciative.

Thanks. Tom

Tom: you don't need to set the fs.s3a.secrets if running on EC2; S3A will pick the auth details automatically from the IAM metadata made available to processes in the VM.