Support Questions
Find answers, ask questions, and share your expertise

How to get the YARN jobs metadata directly ( not using API)?

New Contributor

Is there a way to get the YARN jobs metadata by accessing the database directly( where it got stored)?  There are multiple  reasons and in order address to those  I'm posting this question.

 

1) using API, automation is security risk as CURL requires user/password  

2) API dumps the data in JSON format, which is not of uniform structure

3) so additional processing is required to cleanup the Json data before getting ready to load

4) Our Hadoop admin team  notified that Cloudera dumps  passwords inside the metadata, so  the access to metadata tables ( currently resides in Oracle database)  are not allowed  as it create security issues. - appreciate if  anyone can confirm on this point , as I do not have access to metadata to validate this. On the other Cloudera support  deny this..  If this hurdle moves and gain access to metadata layer, the data can be directly loaded to HDFS.. 

 

thank in advance.

2 REPLIES 2

Re: How to get the YARN jobs metadata directly ( not using API)?

Cloudera Employee

You have the following options to see the jobs counters (metadata).

mapred job -history /usr/history/done/<date>/<job>.jhist -format human|json


For the json format, you can pipe its output to python -m json.tool for a cleaner output.

Note that JHS seeds the jobs from .jhist files (For every job, there is one .jhist file) that are stored
in the HDFS directory, by default /user/history/done. The .jhist files are generated by individual job
before the job completes. You may access this metadata from the .jhist files with the above commands.  If the AM failed to move its .jhist file to the directory that JHS looks for, JHS has no idea of the job at all.

mapred job -history /usr/history/done/<date>/<job>.jhist -format human
mapred job -history /usr/history/done/<date>/<job>.jhist -format json
mapred job -history /usr/history/done/<date>/<job>.jhist -format json | python -m json.tool

 No password is needed, but you might have to kinit if in a kerberized environment. 

Re: How to get the YARN jobs metadata directly ( not using API)?

Community Manager

@kokku Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 


Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community: