My cluster is on HDP 2.5.3 and we have multiple jobs running every 2-5 minutes frequency. So there will be 8-10 K job id's generated in a day with job history. Actually I wanted a log of job which was 3 days older & I was not able to find since MapReduce JHS was storing 20,000 Job ID's, how do I increase this to maximum number?
This is specified through yarn.resourcemanager.max-completed-applications, which is 10K by default. It seems someone has already increased it to 20K on your environment. Keep in mind that 'this value impacts the RM recovery performance.Typically, a smaller value indicates better performance on RM recovery.' [source]