Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why does one remove or add nodes in a Hadoop cluster frequently?

Why does one remove or add nodes in a Hadoop cluster frequently?

New Contributor

Why to add or remove nodes frequently in Hadoop cluster?
What is the need to add or remove nodes in Hadoop cluster frequently?

1 REPLY 1
Highlighted

Re: Why does one remove or add nodes in a Hadoop cluster frequently?

Super Collaborator

Ideally, you would not be removing datanodes, only node manangers (YARN), or Spark executors (Standalone Spark).

The need to do this depends on your hardware resources. For example, in the cloud, such as AWS EMR, you can scale up a job to add more compute in AWS auto-scaling groups. The data is persisted for long-term in S3, and only exists briefly on HDFS for running the necessary actions quickly.

You pay for the run time of these clusters, and if you are running a job only in the morning everyday, then you don't need the cluster running in the afternoon just sitting idle.

Don't have an account?
Coming from Hortonworks? Activate your account here