Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

easiest way to remove hadoop logs

avatar
Super Collaborator

Hi,

Can I just delete rm -rf * from some of the log folders such as /var/log/hive

16G ./hive

Thanks,

Avijeet

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Avijeet Dash, The suggestion from Sunile is great. But, where you can't do that, here is a solution.

If you need to manually delete all but the last X files named with a certain file pattern (*.zip, files*.log, etc), you can run something like this command which finds all but the most recent 5 matching files.

# find MY_LOG_DIR -type f -name "FILE_PATTERN" -printf "%T+\t%p\n" | sort |awk '{print $2}' |head -n -5 |xargs -i CMD_FOR_EACH_FILE {}

Replace the bold parts as needed.

For example, the following command will find all but the most recent 5 files matching pattern *.log.20##-##-## and deletes them. Note, since this command is a delete command, before running something so drastic, you should test first by replacing the "rm" with "ls -l" or do a "mv" instead. Test, test, test.

# find /var/log/hive -type f -name "*.log.20[0-9][0-9]-[0-2][0-9]-[0-9][0-9]" -printf "%T+\t%p\n" | sort |awk '{print $2}' |head -n -5 |xargs -i rm {}

There are always many ways to solve a problem and I'm sure there is a more elegant solution.

View solution in original post

3 REPLIES 3

avatar
Master Guru

You can remove the log files but I would recommend much easier way to have this automated.

Most services in hadoop user log4j. Simply enable RollingFileAppender and set MaxBackupIndex to max number of log files you want to retention for that service.

https://community.hortonworks.com/articles/48937/how-do-i-control-log-file-retention-for-common-hdp....

avatar
Super Collaborator

@Avijeet Dash, The suggestion from Sunile is great. But, where you can't do that, here is a solution.

If you need to manually delete all but the last X files named with a certain file pattern (*.zip, files*.log, etc), you can run something like this command which finds all but the most recent 5 matching files.

# find MY_LOG_DIR -type f -name "FILE_PATTERN" -printf "%T+\t%p\n" | sort |awk '{print $2}' |head -n -5 |xargs -i CMD_FOR_EACH_FILE {}

Replace the bold parts as needed.

For example, the following command will find all but the most recent 5 files matching pattern *.log.20##-##-## and deletes them. Note, since this command is a delete command, before running something so drastic, you should test first by replacing the "rm" with "ls -l" or do a "mv" instead. Test, test, test.

# find /var/log/hive -type f -name "*.log.20[0-9][0-9]-[0-2][0-9]-[0-9][0-9]" -printf "%T+\t%p\n" | sort |awk '{print $2}' |head -n -5 |xargs -i rm {}

There are always many ways to solve a problem and I'm sure there is a more elegant solution.

avatar
Super Collaborator

Note that the above is deletes older files based on file modification time, not based on the timestamp in the filename. I did use the filename with a timestamp, which probably makes the example confusing. So that command could be used with any kind of file such as keeping the last 5 copies of your backup files.

Also, if you use logrotate (e.g. where log4j rolling files is not an option), you can use the maxage option, which also uses modified time. This is from the logrotate man page:

       maxage count
              Remove  rotated logs older than <count> days. The age is only checked if the logfile is to be rotated. The files are mailed to the configured address if maillast and mail are configured.