Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Mapred Jobhistory server is unable to read aggregated log file from a busy nodemanager

Highlighted

Mapred Jobhistory server is unable to read aggregated log file from a busy nodemanager

New Contributor

I have been trying to figure the cause of missing logs on the jobhistory server for completed mapred job yarn containers. The problem only seems to arise when a single nodemanager is running many containers (12+); when running fewer containers this problem doesn't come up. The nodemanager has 13 data disks and that's where the yarn container local/log data get's stored. Log aggregation indicates it completes successfully and the aggregated log file looks correct (i was able to read it using LogAggregationIndexedFileController) but in the jobhistory server i see this message when trying to look at a completed map tasks logs:

2018-07-13 00:48:34,244 WARN webapp.View (IndexedFileAggregatedLogsBlock.java:render(139)) - Can not load log meta from the log file:hdfs://hadoopnn1.net:8020/app-logs/rana/logs-ifile/application_1530921118753_0010/hadoopdn4.net_45454

Current log aggregation settings are:

"yarn.log-aggregation.file-formats" : "IndexedFormat,TFile",
"yarn.nodemanager.log-aggregation.debug-enabled" : "false",
"yarn.log-aggregation.retain-seconds" : "2592000",
"yarn.nodemanager.log-aggregation.num-log-files-per-app" : "336",
"yarn.log-aggregation.file-controller.IndexedFormat.class" : "org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController",
"yarn.log-aggregation-enable" : "true",
"yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds" : "3600",
"yarn.log-aggregation.file-controller.TFile.class" : "org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController",
"yarn.nodemanager.log-aggregation.compression-type" : "gz",

Has anyone seen this before?

3 REPLIES 3

Re: Mapred Jobhistory server is unable to read aggregated log file from a busy nodemanager

New Contributor

This is still a major problem for me. Can someone please help? Log aggregation seems broken if it cannot be used on a busy cluster with mapreduce.

Versions are :

HDP-2.6.4.0

HDFS	2.7.3

YARN	2.7.3

MapReduce2	2.7.3

Tez	0.7.0

Hive	1.2.1000

HBase	1.1.2

Pig	0.16.0

Oozie	4.2.0

ZooKeeper	3.4.6

Ambari Infra	0.1.0

Ambari Metrics	0.1.0

Ranger	0.7.0

Ranger KMS	0.7.0

Slider	0.92.0

Re: Mapred Jobhistory server is unable to read aggregated log file from a busy nodemanager

New Contributor

Still the same case for me, I am using HDP-2.6.5.41

Re: Mapred Jobhistory server is unable to read aggregated log file from a busy nodemanager

New Contributor

Still the same case for me, I am using HDP-2.6.5.41

Don't have an account?
Coming from Hortonworks? Activate your account here