Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Storing HDFS data only on specific nodes

Storing HDFS data only on specific nodes

Expert Contributor

Hi,


We have a 30 nodes production cluster. We want to add 5 data nodes for additional storage to handle the interim spike of data( around 2 TB). This data is to be stored temporarily and we want to get rid of it after 15 days.

Is it possible to make sure that the interim data (2 TB) coming in will be stored only on the newly added data nodes?

I am looking for something similar to YARN node labelling.


Regards,

SS

1 REPLY 1
Highlighted

Re: Storing HDFS data only on specific nodes

Expert Contributor

Hi SS,

you could try to declare the disks of the additional nodes as SSD-tier and flag the temporary data with One_SSD storage policy. This way, data should only reside on the declared "SSD-disks" and by that on the "burst nodes".

However, keep in mind the performance implications when storing data only on a subset of your cluster. Jobs that primarily use that data might create more heavy network load and suffer from a lower aggregated IO bandwith thus leading to degraded performance.

Regards,

Benjamin

Don't have an account?
Coming from Hortonworks? Activate your account here