For one of my cluster i create a daily report for the number of Jobs ran for the prev day, which includes job wait time, job launch time, no of jobs failed, succeeded, killed etc.
But form last two weeks the number jobs provided by Zeppelin notebook is different than the job count on RM UI for the same day.(less in zeppe)
For Zeppelin we are using a query to get the data from activity.yarn_application a, activity.job b tables and we are capturing only jobs which meet the condition that b.APP_ID = a.APP_ID.
Further analyzing i can see all the jobs are present in activity.yarn_application but not in activity.job.
Please let me know if any more input is required
I did LEFT OUTER JOIN on this and and it seems to work but i have another issue. The time of the job fetched by the zeppelin smart sense query and that of the same job on RM UI is different by two hours, how do i change that?
i did the following->
1) Go to HBASE -> Configs -> Custom hbase-site. Add the property phoenix.query.dateFormatTimeZone=GMT+08:00
2) using the "timeZone" option in the Phoenix Thin Driver's
but it doesn't seem to work. Is there anything else i can do?
@Jay SenSharma Please can you help with how can i access these tables. some of my Tez jobs and all of my Spark jobs are not entering activity.job table. hence it is not giving many jobs in the output. I did left outer join so that i get all jobs but why is it happening?