on HDFS I have 4TB of logs in /app-logs/hive/logs-ifile
There then look to be folders for individual applications that have run going back to March 14th 2018.
There are 202k folders, most are under 1MB some are a few MB and some run to GB with one being 970GB.
Picking one of the smaller ones at random the files nested in the application directory it looks like it relates to Hive2 Interactive (LLAP) and I think March was about when queries started to be run on LLAP for the cluster.
I've looked at the 970GB folder and it looks to be made up of 88 files of between 10-12GB each. The file names are of the format [FQDN]_45454_1540505117980 and are one of two hosts, at the time of the files of creation there would only have been two nodes in our LLAP config.
My questions are:
- is there somewhere I can set a retention policy for this as 10 months seems excessive logging.
- can I just delete it out or could that bite me in the arse?