Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Hive /app-logs retention policies

Explorer

on HDFS I have 4TB of logs in /app-logs/hive/logs-ifile

There then look to be folders for individual applications that have run going back to March 14th 2018.

There are 202k folders, most are under 1MB some are a few MB and some run to GB with one being 970GB.

Picking one of the smaller ones at random the files nested in the application directory it looks like it relates to Hive2 Interactive (LLAP) and I think March was about when queries started to be run on LLAP for the cluster.

I've looked at the 970GB folder and it looks to be made up of 88 files of between 10-12GB each. The file names are of the format [FQDN]_45454_1540505117980 and are one of two hosts, at the time of the files of creation there would only have been two nodes in our LLAP config.

My questions are:

- is there somewhere I can set a retention policy for this as 10 months seems excessive logging.

- can I just delete it out or could that bite me in the arse?

2 REPLIES 2

Explorer

found this https://community.hortonworks.com/content/supportkb/228145/yarn-aggregation-log-deletion-service-is-...

I've applied the suggested change but still have log files going back to March, does it take a while to trigger clean up?

New Contributor

can you please send me the steps. I'm not able to open the link.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.