This is a more capacity planning question. Over the last few months i have observed that YARN RAM utilization of our cluster has gone up considerably , due to new process being on boarded onto the cluster. At this point i am trying to find out the length of time (cumulative) through the day for which RAM usage is peaking.
Any information on how to go about this will be helpfull.
@Suhel , You can try to get report from Ambari Metrics. or another way is to schedule a job to collect memory data on every machine after every 60 seconds. You can use "memfree" or "sar" command to collect memory usage with timestamp and keep appending output to a file and generate report top on that in excel or any other reporting tool to find out memory usage per node.