Created 01-09-2017 06:41 PM
Hi there, questions for the cloud/HDP experts please...
- I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below
that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?
- I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?
- I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?
- EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?
thanks!
jhals99
Created 01-10-2017 10:24 PM
I'm looking to provision HDP clusters in an AWS VPC, on Custom Hardened AMI's. Understand from the link below that Cloudbreak w/custom AMI requires support subscription. Which level of support exactly?
Cloudbreak in HDP is supported within the Enterprise Plus Subscription. Please see Support Subscriptions on hortonworks.com for more information.
I have data on s3 encrypted with Server side encryption. S3A should provide seamless access to encrypted data: yes/no?
Yes. In core-site.xml, you'll find the following configuration available:
spark.hadoop.fs.s3a.access.key MY_ACCESS_KEY spark.hadoop.fs.s3a.secret.key MY_SECRET_KEY
I heard about an Amazon EMR customer using S3 (not local storage/hdfs) for an Hbase environment. I assume S3A will provide the same functionality with Hbase in HDP? No special sauce in EMR right?
S3 is not recommended for HBase today. Amazon EMR supports HBase with a combination of HFiles on S3 and the WAL on ephemeral HDFS – this configuration can have data loss in the face of failures. Our HBase team is aware of this and there are a couple other options we are exploring such as HFile on S3 and WAL on EBS / EFS – this is still under investigation.
EC2 local disks have to be LUKS encrypted. That should not matter to cloudbreak/hdp right?
Correct, Hadoop uses the operating system to read disk. Disk-level encryption is below the operating system and therefore, is not a concern for Cloudbreak/HDP.
We require s3a role based authentication. Access to s3 files from Hive, Pig, MR, Spark etc. My question is: Does this work out of the box with the latest release of HDP?
Yes. Please refer to the HCC Article: How to Access Data Files stored in AWS S3 Buckets using HDFS / Hive / Pig for more information.
Created 01-14-2017 01:31 PM
S3A on HDP 2.5 supports server side encryption.
<property> <name>fs.s3a.server-side-encryption-algorithm</name> <value>AES256</value> <description>Specify a server-side encryption algorithm for s3a: file system. Unset by default, and the only other currently allowable value is AES256. </description> </property>
Created 01-14-2017 01:34 PM
One more thing: S3A doesn't handle R/W buckets with different ACLs attached to different parts of it, as the code expects to have write access to any R/W repo. You should restrict access by bucket, not by trying to use some kind of ACL within a bucket