I am looking for the best possible method to gather filesystem counters, job counters and Mapreduce framework details of all the jobs that ran on a specific date. Since upon completion of a job, the logs for the job are stored in HDFS and the information about the job is shipped off to a dedicated server called the JobHistory Server, I am looking at the node that's running Jobhistory server and port 19888 is currently locked down. I am looking for a way to either:
1) query HDFS to get data I need, or
2) open the port and use Jobhistory web UI on port 19888
3) other methods
CDH v5.1.x We are currently not using Cloudera Manager.