Support Questions

Find answers, ask questions, and share your expertise

Should i need to cleaning up of tmp space in hadoop cluster on weekly basis ? if yes how can i do it? please suggest

avatar
Expert Contributor
 
1 ACCEPTED SOLUTION

avatar
Guru
@sankar rao

you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed.

There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days

You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only:

#!/bin/bash
usage="Usage: dir_diff.sh [days]"

if [ ! "$1" ]
then
  echo $usage
  exit 1
fi

now=$(date +%s)
hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

  if [ $difference -gt $1 ]; then
    hdfs dfs -rm -f $(echo $f | awk '{print $NF}');
  fi
done

View solution in original post

3 REPLIES 3

avatar
Rising Star

I'm assuming you are referring to /tmp/ directory in hdfs. You can use below command to clean it up and cron it to run every week.

hadoop fs -rm -r /tmp/*

avatar
Expert Contributor

@Rahul Reddy

Thank you so much Rahul...so if i deleted hdfs /tmp directory which is not effect my current jobs?

avatar
Guru
@sankar rao

you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed.

There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days

You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only:

#!/bin/bash
usage="Usage: dir_diff.sh [days]"

if [ ! "$1" ]
then
  echo $usage
  exit 1
fi

now=$(date +%s)
hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

  if [ $difference -gt $1 ]; then
    hdfs dfs -rm -f $(echo $f | awk '{print $NF}');
  fi
done