Created 07-07-2016 10:13 AM
Created 07-08-2016 05:19 PM
you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed.
There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days
You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only:
#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hdfs dfs -rm -f $(echo $f | awk '{print $NF}'); fi done
Created 07-07-2016 10:24 PM
I'm assuming you are referring to /tmp/ directory in hdfs. You can use below command to clean it up and cron it to run every week.
hadoop fs -rm -r /tmp/*
Created 07-08-2016 06:34 AM
Thank you so much Rahul...so if i deleted hdfs /tmp directory which is not effect my current jobs?
Created 07-08-2016 05:19 PM
you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed.
There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days
You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only:
#!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hdfs dfs -rm -f $(echo $f | awk '{print $NF}'); fi done