Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDP install in AWS VPC, on custom AMI -- feasibility question.

Solved Go to solution
Highlighted

HDP install in AWS VPC, on custom AMI -- feasibility question.

New Contributor

Hi there, questions for the cloud/HDP experts please...

- I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below

that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

https://community.hortonworks.com/questions/75417/cloudbreak-161-deploying-clusters-with-custom-amis...

- I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

- I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

- EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

thanks!

jhals99

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

@jhals99

I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.

I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

Yes. In core-site.xml, you'll find the following configuration available:

spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY
spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY

I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.

EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.

We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?

Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.

View solution in original post

11 REPLIES 11
Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

New Contributor

Edit: Additional Info: We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?

Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

Rising Star

Hi @jhals99,

I do not know which level you need exactly, but if you create a support ticket or get in touch with the Solution Engineer(SE) on the project should solve this question.

LUKS encryption should work but we haven't tried it, so I am not 100% sure about that.

As I know we haven't test S3A from this approach. I think you should create a question in the Hadoop and or HDFS topics. Maybe the JIRA-s in the S3A docs could be a good start for searching the right contact for these questions.

Br,

Tamas

Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

New Contributor

Thanks!! @Tamas Bihari for the pointers. Just wanted clarify: when you say "As I know we haven't test S3A from this approach" -- are you responding to my question about hbase/s3?

thanks, jhals99

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

Contributor

EMR is doing HBase on S3 with WAL stored on local HDFS - a better approach is to do WAL on EBS or EFS. WAL on local HDFS is not really recommended as you can loose data if the node goes away.

Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

New Contributor

Thanks! @Janos Matyas -- understood about hbase wal.

Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

@jhals99

I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?

Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.

I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?

Yes. In core-site.xml, you'll find the following configuration available:

spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY
spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY

I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?

S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.

EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?

Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.

We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?

Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.

View solution in original post

Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

New Contributor

thanks!! @Tom McCuch and other Hortonworks folks for the answers.

Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

@jhals99

Likewise for the great question! If you could accept my answer, I'd be very appreciative.

Thanks. Tom

Highlighted

Re: HDP install in AWS VPC, on custom AMI -- feasibility question.

Tom: you don't need to set the fs.s3a.secrets if running on EC2; S3A will pick the auth details automatically from the IAM metadata made available to processes in the VM.

Don't have an account?
Coming from Hortonworks? Activate your account here