Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to write HDFS data to a specific device

avatar
New Member

My existing EBS volumes are transparently encrypted. I added an extra volume that is not encrypted. Now I want to be able to control where HDFS writes a file. I think it must be possible because heterogeneous storage policies tell HDFS where to write. How can I do this?

1 ACCEPTED SOLUTION

avatar
Guru

Hi @Peter Coates

HDFS does support heterogeneous storage types but specifying your own storage type is not supported. You need to use one from pre-defined types (ARCHIVE, DISK, SSD and RAM_DISK). Each storage type comes with its own policy (which affects the way creation & replicas will be handled).

So if you can differentiate between your encrypted and non-encrypted volume based on these storage types, then only can control where HDFS writes a file.

Hope this helps.

Reference: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html

View solution in original post

2 REPLIES 2

avatar
Guru

Hi @Peter Coates

HDFS does support heterogeneous storage types but specifying your own storage type is not supported. You need to use one from pre-defined types (ARCHIVE, DISK, SSD and RAM_DISK). Each storage type comes with its own policy (which affects the way creation & replicas will be handled).

So if you can differentiate between your encrypted and non-encrypted volume based on these storage types, then only can control where HDFS writes a file.

Hope this helps.

Reference: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html

avatar
New Member

I feared as much. Thank you for your suggestion--I think it work for us, as this is a cloud cluster, and we can archive to S3, obviating the need to use heterogeneous storage for its intended purpose. However, I would like to suggest a Jira ticket to add a storage class for this purpose. There are significant use-cases where it would be useful to know that a subset of your data is confined to specific drives (a) without the restrictions of the existing policies (b) without abusing a storage class for this purpose.