Support Questions

Fawze · ‎10-29-2017

Hi Guys,

I'm getting from time to time that some NodeManagers got lost in Yarn as a result of log-dirs are bad: /var/log/hadoop-yarn/container.

Looking at the disk space and don't see any issue there, at the Resource manager i see:

INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_e37_1509251204123_1378_01_000001, NodeId: avpr-dhc001.lpdomain.com:8041, NodeHttpAddress: avpr-dhc001.lpdomain.com:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 172.16.144.140:8041 }, ] for AM appattempt_1509251204123_1378_000001
2017-10-29 05:08:22,593 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node avpr-dhc001.lpdomain.com:8041 reported UNHEALTHY with details: 1/1 log-dirs are bad: /liveperson/hadoop/log/hadoop-yarn/container
2017-10-29 05:08:22,593 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: avpr-dhc001.lpdomain.com:8041 Node Transitioned from RUNNING to UNHEALTHY

I don't see any issue in the DataNode or NodeManager logs.

No inode issue in the server.

Fawze · ‎10-29-2017

The problem was the limitation of sub directory under specific dir

so when checking the folder container i see there is 32,000 directories which is the limit.

looking why the retention isnot deleting these files and i have the following conf:

Log Aggregation Retention Period 7 days

Job History Files Cleaner Interval 1 day

Log Retain Duration 3 hours

Cloudera Community

Support Questions

log-dirs are bad: /var/log/hadoop-yarn/container

Kafka log-dir .lock exception

Manage YARN local log-dirs space

Can someone explain what the yarn local and log di...

Moving Kafka Log dirs through Ambari

Can yarn local application log dir be configurable...

Bad : The Hive Metastore canary failed to create a...

IMPALA_IMPALADS_HEALTHY has become bad

1/1 local-dirs are bad: /yarn/nm; 1/1 log-dirs are...

Cloudera Manager agent bad healthy

Name Node Health Bad