Created 01-09-2017 06:41 PM
Hi there, questions for the cloud/HDP experts please...
- I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below
that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?
- I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?
- I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?
- EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?
thanks!
jhals99
Created 01-10-2017 10:24 PM
I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?
Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.
I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?
Yes. In core-site.xml, you'll find the following configuration available:
spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY
I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?
S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.
EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?
Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.
We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?
Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.
Created 01-09-2017 07:06 PM
Edit: Additional Info: We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?
Created 01-10-2017 04:14 PM
Hi @jhals99,
I do not know which level you need exactly, but if you create a support ticket or get in touch with the Solution Engineer(SE) on the project should solve this question.
LUKS encryption should work but we haven't tried it, so I am not 100% sure about that.
As I know we haven't test S3A from this approach. I think you should create a question in the Hadoop and or HDFS topics. Maybe the JIRA-s in the S3A docs could be a good start for searching the right contact for these questions.
Br,
Tamas
Created 01-10-2017 05:44 PM
Thanks!! @Tamas Bihari for the pointers. Just wanted clarify: when you say "As I know we haven't test S3A from this approach" -- are you responding to my question about hbase/s3?
thanks, jhals99
Created 01-10-2017 06:24 PM
EMR is doing HBase on S3 with WAL stored on local HDFS - a better approach is to do WAL on EBS or EFS. WAL on local HDFS is not really recommended as you can loose data if the node goes away.
Created 01-10-2017 06:53 PM
Thanks! @Janos Matyas -- understood about hbase wal.
Created 01-10-2017 10:24 PM
I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?
Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.
I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?
Yes. In core-site.xml, you'll find the following configuration available:
spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY
I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?
S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.
EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?
Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.
We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?
Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.
Created 01-11-2017 06:40 PM
thanks!! @Tom McCuch and other Hortonworks folks for the answers.
Created 01-11-2017 06:43 PM
Likewise for the great question! If you could accept my answer, I'd be very appreciative.
Thanks. Tom
Created 01-14-2017 01:27 PM
Tom: you don't need to set the fs.s3a.secrets if running on EC2; S3A will pick the auth details automatically from the IAM metadata made available to processes in the VM.