Why to add or remove nodes frequently in Hadoop cluster?
What is the need to add or remove nodes in Hadoop cluster frequently?
Ideally, you would not be removing datanodes, only node manangers (YARN), or Spark executors (Standalone Spark).
The need to do this depends on your hardware resources. For example, in the cloud, such as AWS EMR, you can scale up a job to add more compute in AWS auto-scaling groups. The data is persisted for long-term in S3, and only exists briefly on HDFS for running the necessary actions quickly.
You pay for the run time of these clusters, and if you are running a job only in the morning everyday, then you don't need the cluster running in the afternoon just sitting idle.