Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Query JobHistory Server in YARN to obtain data for completed jobs

avatar
Contributor

I am looking for the best possible method to gather filesystem counters, job counters and Mapreduce framework details of all the jobs that ran on a specific date. Since upon completion of a job, the logs for the job are stored in HDFS and the information about the job is shipped off to a dedicated server called the JobHistory Server, I am looking at the node that's running Jobhistory server and port 19888 is currently locked down. I am looking for a way to either:

 

1) query HDFS to get data I need, or

2) open the port and use Jobhistory web UI on port 19888

3) other methods

 

CDH v5.1.x We are currently not using Cloudera Manager.

1 ACCEPTED SOLUTION

avatar
Contributor
I have discovered that Hadoop JobHistory service has not been running, that explains why no logs have been moved to HDFS. Will have to fix that, and I should have functionality I am looking for.

View solution in original post

1 REPLY 1

avatar
Contributor
I have discovered that Hadoop JobHistory service has not been running, that explains why no logs have been moved to HDFS. Will have to fix that, and I should have functionality I am looking for.