Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to write HDFS data to a specific device

Solved Go to solution
Highlighted

How to write HDFS data to a specific device

Explorer

My existing EBS volumes are transparently encrypted. I added an extra volume that is not encrypted. Now I want to be able to control where HDFS writes a file. I think it must be possible because heterogeneous storage policies tell HDFS where to write. How can I do this?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to write HDFS data to a specific device

Guru

Hi @Peter Coates

HDFS does support heterogeneous storage types but specifying your own storage type is not supported. You need to use one from pre-defined types (ARCHIVE, DISK, SSD and RAM_DISK). Each storage type comes with its own policy (which affects the way creation & replicas will be handled).

So if you can differentiate between your encrypted and non-encrypted volume based on these storage types, then only can control where HDFS writes a file.

Hope this helps.

Reference: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html

View solution in original post

2 REPLIES 2
Highlighted

Re: How to write HDFS data to a specific device

Guru

Hi @Peter Coates

HDFS does support heterogeneous storage types but specifying your own storage type is not supported. You need to use one from pre-defined types (ARCHIVE, DISK, SSD and RAM_DISK). Each storage type comes with its own policy (which affects the way creation & replicas will be handled).

So if you can differentiate between your encrypted and non-encrypted volume based on these storage types, then only can control where HDFS writes a file.

Hope this helps.

Reference: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html

View solution in original post

Highlighted

Re: How to write HDFS data to a specific device

Explorer

I feared as much. Thank you for your suggestion--I think it work for us, as this is a cloud cluster, and we can archive to S3, obviating the need to use heterogeneous storage for its intended purpose. However, I would like to suggest a Jira ticket to add a storage class for this purpose. There are significant use-cases where it would be useful to know that a subset of your data is confined to specific drives (a) without the restrictions of the existing policies (b) without abusing a storage class for this purpose.

Don't have an account?
Coming from Hortonworks? Activate your account here