Reply
Explorer
Posts: 12
Registered: ‎08-20-2015

MR2 (Yarn) logs Location in Hadoop 2 CDH5.3.3

We were having a process in CDH4 which was combination of Shell script and MR. Which was working fine in CDH4. We recently moved to CDH5.3 where we replace jobtracker with Yarn our current process fail.

 

Reason for failure is we were reading mapreduce status in script from logs. In CDH4 MR logs are written in [output]/_logs/history file where as in CDH5 all logs are moved to common location. To read the logs in CDH5 we need to know the job id which can only be get from logs. So it’s a catch 22 problem.

 

My question is can we force MR2 (Yarn) to write same logs on same location as we were doing in CDH4. This way we need not to make major changes to the script. Because our script is heavily dependent of content of logs to get jobid, jobstatus and number of records process to make post process decisions.

 

Thanks in Advance for the help.

 

-mukgup

Posts: 1,892
Kudos: 432
Solutions: 302
Registered: ‎07-31-2013

Re: MR2 (Yarn) logs Location in Hadoop 2 CDH5.3.3

This is a major difference between MRv1 (JobTracker, TaskTrackers) and YARN+MRv2 (ResourceManager, NodeManagers and JobHistoryServer). There's no way to have the older behaviour as the whole way of placing logs back onto HDFS in YARN works via the log-aggregation layer, and this part is agnostic of what an 'output directory' is, while MR1 mandated something such as that.

Your script will need to change as the older system is no longer compatible with the current mechanisms. If its of any help, the JobHistoryServer offers a REST API to pull information off it: http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/His... You can also rely on a more direct 'yarn logs' command: http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs
Announcements