Cloud deployment for the Hadoop stack on public Cloud provides cost/performance benefits for the temporary/batch workloads. But there is a need to keep master/management nodes only permanently up and running to maintain the meta-data for Navigator and all the other configurations made, etc..
What is the best practice to have a temporary cluster on the Public Cloud without caveats (for eg: not to loose lineage in the Navigator, the cluster has to remain permanent and how to create a minimum permanent cluster with data nodes added/removed daily?)
Assuming that you are using the cloud providers db as the back end of CM if you are removing the cluster nodes then you could just put the CM node to sleep. There is no real reason for the CM to be active without cluster nodes. You can just wake the node up when you want to deploy data nodes again.
Here are some tips to help answer your question.
First, use cloud storage services, like S3 on AWS or ADLS on Azure, for keeping data like Navigator lineage information. Those services provide availability and reliability automatically. Hadoop and other services can be configured to use cloud storage in various ways instead of local block (hard drive) storage.
Sometimes it's not efficient or high-performing enough to exclusively use cloud storage. For example, a typical data analysis job may have several stages where data is read, processed, and then written back, and those round trips to storage services can be slower and cost more money than local drive access. So, think about adjusting how data is managed, so that intermediate data resides in local block storage, but final results are sent to cloud storage for safekeeping.
Once all of the important data is safe in cloud storage, it becomes less important to keep cluster instances running. You can even destroy entire clusters, including master / manager nodes, knowing that the data is safe in cloud storage. At this point, you will want to use automation tools, like Altus Director or Cloudbreak, so that you can easily spin up new clusters that are configured to pull their initial data from cloud storage. Then, you only run clusters when you need them.
If that isn't feasible, you can still do something like what you suggest, with clusters that have some permanent nodes and some transient ones. If so, ensure that those transient nodes do not keep important state that isn't safe elsewhere. For example, YARN node managers are stateless, so scaling nodes only housing those ("worker" nodes) is an easy goal to achieve. By constrast, HDFS datanodes store file data, so those aren't as easy to scale down - you can, though, as long as they are decommissioned properly using Cloudera Manager or Ambari, which the cloud automation tools handle for you.