Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

easiest way to remove hadoop logs

avatar
Super Collaborator

Hi,

Can I just delete rm -rf * from some of the log folders such as /var/log/hive

16G ./hive

Thanks,

Avijeet

1 ACCEPTED SOLUTION

avatar
Master Collaborator

@Avijeet Dash, The suggestion from Sunile is great. But, where you can't do that, here is a solution.

If you need to manually delete all but the last X files named with a certain file pattern (*.zip, files*.log, etc), you can run something like this command which finds all but the most recent 5 matching files.

# find MY_LOG_DIR -type f -name "FILE_PATTERN" -printf "%T+\t%p\n" | sort |awk '{print $2}' |head -n -5 |xargs -i CMD_FOR_EACH_FILE {}

Replace the bold parts as needed.

For example, the following command will find all but the most recent 5 files matching pattern *.log.20##-##-## and deletes them. Note, since this command is a delete command, before running something so drastic, you should test first by replacing the "rm" with "ls -l" or do a "mv" instead. Test, test, test.

# find /var/log/hive -type f -name "*.log.20[0-9][0-9]-[0-2][0-9]-[0-9][0-9]" -printf "%T+\t%p\n" | sort |awk '{print $2}' |head -n -5 |xargs -i rm {}

There are always many ways to solve a problem and I'm sure there is a more elegant solution.

View solution in original post

3 REPLIES 3

avatar
Master Guru

You can remove the log files but I would recommend much easier way to have this automated.

Most services in hadoop user log4j. Simply enable RollingFileAppender and set MaxBackupIndex to max number of log files you want to retention for that service.

https://community.hortonworks.com/articles/48937/how-do-i-control-log-file-retention-for-common-hdp....

avatar
Master Collaborator

@Avijeet Dash, The suggestion from Sunile is great. But, where you can't do that, here is a solution.

If you need to manually delete all but the last X files named with a certain file pattern (*.zip, files*.log, etc), you can run something like this command which finds all but the most recent 5 matching files.

# find MY_LOG_DIR -type f -name "FILE_PATTERN" -printf "%T+\t%p\n" | sort |awk '{print $2}' |head -n -5 |xargs -i CMD_FOR_EACH_FILE {}

Replace the bold parts as needed.

For example, the following command will find all but the most recent 5 files matching pattern *.log.20##-##-## and deletes them. Note, since this command is a delete command, before running something so drastic, you should test first by replacing the "rm" with "ls -l" or do a "mv" instead. Test, test, test.

# find /var/log/hive -type f -name "*.log.20[0-9][0-9]-[0-2][0-9]-[0-9][0-9]" -printf "%T+\t%p\n" | sort |awk '{print $2}' |head -n -5 |xargs -i rm {}

There are always many ways to solve a problem and I'm sure there is a more elegant solution.

avatar
Master Collaborator

Note that the above is deletes older files based on file modification time, not based on the timestamp in the filename. I did use the filename with a timestamp, which probably makes the example confusing. So that command could be used with any kind of file such as keeping the last 5 copies of your backup files.

Also, if you use logrotate (e.g. where log4j rolling files is not an option), you can use the maxage option, which also uses modified time. This is from the logrotate man page:

       maxage count
              Remove  rotated logs older than <count> days. The age is only checked if the logfile is to be rotated. The files are mailed to the configured address if maillast and mail are configured.