Created on 01-05-2017 09:12 AM - edited 09-16-2022 03:53 AM
We have CDH 5.7.2 installed alongside with Cloudera Manager 5.8.1 at our company. We have configured YARN log aggregation to be enabled and YARN log aggregation retain seconds set to 1 day. For some reason, the YARN job logs in the default HDFS directory /tmp/logs/ are not being deleted. Can anyone explain why this is?
BTW, we have both Hive and Spark jobs running on our cluster.
Thanks,
Ben
Created 01-05-2017 01:14 PM
Created 01-05-2017 01:14 PM
Created 01-05-2017 08:34 PM
To answer your questions:
The /tmp/logs and all subdirs are 770 and the group is hdfs. Should the group be hadoop instead? I see that the yarn user is not part of the hdfs group but is in the hadoop group.
The logs date back to Dec 18 and increase in size less than 1TB per day. We manually delete the logs to prevent it getting to big.
Thanks,
Ben
Created 01-09-2017 12:16 AM
Created 10-22-2018 01:06 AM
Hi , my cluster is CDH 5.7.2,CM5.7.0, and I meet the same touble.
we set dfs.permissions.superusergroup=supergroup ; and we run the mapreduce application by 'hdfs' user, the hdfs file like this:
drwxrwx--- - hdfs supergroup 0 2018-06-05 15:01 /tmp/logs/hdfs
and the linux mapping of user to group is :
hadoop:x:497:hdfs,mapred,yarn
supergroup:x:505:hdfs,yarn
what should I do to resolve this problem? thanks you very much.
Created 10-22-2018 06:23 PM
Created 10-22-2018 06:43 PM
Thank you so munch!
I change the group of '/tmp/logs' to hadoop , and restart the JobHistoryServer role, everything being OK.
So happy !
Created 01-09-2017 11:25 AM
Thanks for mentioning the information about the hadoop group and permissions. It would seem, that after applying these settings, all is working.
Cheers,
Ben
Created 01-05-2017 05:22 PM
As we know "Yarn Aggregate Log Retention" can control only YARN but /tmp/logs is not limited to YARN
So Can you check the YARN log date using below steps.
CM -> Yarn -> Web UI -> Resource Manager web UI -> (it will open 8088 link) Click on Finished link (left side) -> Come down and click on 'Last' button -> Check the log date -> You should see only one day history data as you configured to 1 day
Note: Make sure CM-> Yarn -> Configuration -> Enable Log Aggregation = Enabled
Thanks
Kumar
Created 01-05-2017 08:35 PM
I did as you asked and see that the oldest finished is from Dec 18, and I see the logs in HDFS under /tmp/logs.
Log Aggregation is enabled.
Thanks,
Ben