Created 06-10-2016 09:58 AM
I have a YARN map/reduce application.
In mapper I use log4j to log certain cases. After job exection is finished I want to analyze logs from ALL mappers. As there're a lot of mappers in my job, log analysis becomes rather painful job...
Is there a way to write log from mappers to some aggregated file to have all the record in one place? Or probaly there's an approach to combine log files from all mappers from a concrete job?
Created 06-10-2016 10:01 AM
if you are running mapreduce over yarn then enable yarn remote log aggregation(http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/), it will centralized all logging for you and you can perform your analysis over on aggregated logs.
Created 06-10-2016 10:01 AM
if you are running mapreduce over yarn then enable yarn remote log aggregation(http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/), it will centralized all logging for you and you can perform your analysis over on aggregated logs.
Created 06-10-2016 10:06 AM
yeah it should be enabled by default though. You would get the log files through the yarn logs command line or you can use pig as well.
https://community.hortonworks.com/articles/33703/mining-tez-app-log-file-with-pig-script.html
Created 06-10-2016 10:08 AM
Thank you @Rajkumar Singh!
You would only add that it's rather convenient for me to use the following approach:
yarn logs -applicationId application_1465548978834_0004 | grep my.foo.class > /home/user/log2.txt
With this command I can filter all the log entries for the class I want to analyze.