Created 09-23-2016 09:30 PM
My existing EBS volumes are transparently encrypted. I added an extra volume that is not encrypted. Now I want to be able to control where HDFS writes a file. I think it must be possible because heterogeneous storage policies tell HDFS where to write. How can I do this?
Created 09-25-2016 10:27 AM
HDFS does support heterogeneous storage types but specifying your own storage type is not supported. You need to use one from pre-defined types (ARCHIVE, DISK, SSD and RAM_DISK). Each storage type comes with its own policy (which affects the way creation & replicas will be handled).
So if you can differentiate between your encrypted and non-encrypted volume based on these storage types, then only can control where HDFS writes a file.
Hope this helps.
Reference: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
Created 09-25-2016 10:27 AM
HDFS does support heterogeneous storage types but specifying your own storage type is not supported. You need to use one from pre-defined types (ARCHIVE, DISK, SSD and RAM_DISK). Each storage type comes with its own policy (which affects the way creation & replicas will be handled).
So if you can differentiate between your encrypted and non-encrypted volume based on these storage types, then only can control where HDFS writes a file.
Hope this helps.
Reference: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
Created 09-25-2016 03:21 PM
I feared as much. Thank you for your suggestion--I think it work for us, as this is a cloud cluster, and we can archive to S3, obviating the need to use heterogeneous storage for its intended purpose. However, I would like to suggest a Jira ticket to add a storage class for this purpose. There are significant use-cases where it would be useful to know that a subset of your data is confined to specific drives (a) without the restrictions of the existing policies (b) without abusing a storage class for this purpose.