We have CDH 5.7.2 installed alongside with Cloudera Manager 5.8.1 at our company. We have configured YARN log aggregation to be enabled and YARN log aggregation retain seconds set to 1 day. For some reason, the YARN job logs in the default HDFS directory /tmp/logs/ are not being deleted. Can anyone explain why this is?
BTW, we have both Hive and Spark jobs running on our cluster.
@benassi check who all belongs to the hadoop group. It should be hdfs, mapred, and yarn. The yarn account, as that is that the RM, NM, and JH run as, will need to have read/write access to be able to remove any old logs.
The group ownership of all directories under /tmp/logs must be 'hadoop' or any group ID that's common between the 'yarn' and 'mapred' IDs. In your case you have it as supergroup, which does not have 'mapred' as its member, but is also the entirely wrong group to use - you do not want to grant HDFS superuser access to YARN service. I'd recommend removing 'yarn' from the 'supergroup' group.
This is what a normal installation should appear as:
As we know "Yarn Aggregate Log Retention" can control only YARN but /tmp/logs is not limited to YARN
So Can you check the YARN log date using below steps. CM -> Yarn -> Web UI -> Resource Manager web UI -> (it will open 8088 link) Click on Finished link (left side) -> Come down and click on 'Last' button -> Check the log date -> You should see only one day history data as you configured to 1 day