Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Should i need to cleaning up of tmp space in hadoop cluster on weekly basis ? if yes how can i do it? please suggest

avatar
Expert Contributor
 
1 ACCEPTED SOLUTION

avatar
Guru
@sankar rao

you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed.

There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days

You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only:

#!/bin/bash
usage="Usage: dir_diff.sh [days]"

if [ ! "$1" ]
then
  echo $usage
  exit 1
fi

now=$(date +%s)
hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

  if [ $difference -gt $1 ]; then
    hdfs dfs -rm -f $(echo $f | awk '{print $NF}');
  fi
done

View solution in original post

3 REPLIES 3

avatar
Rising Star

I'm assuming you are referring to /tmp/ directory in hdfs. You can use below command to clean it up and cron it to run every week.

hadoop fs -rm -r /tmp/*

avatar
Expert Contributor

@Rahul Reddy

Thank you so much Rahul...so if i deleted hdfs /tmp directory which is not effect my current jobs?

avatar
Guru
@sankar rao

you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed.

There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days

You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only:

#!/bin/bash
usage="Usage: dir_diff.sh [days]"

if [ ! "$1" ]
then
  echo $usage
  exit 1
fi

now=$(date +%s)
hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

  if [ $difference -gt $1 ]; then
    hdfs dfs -rm -f $(echo $f | awk '{print $NF}');
  fi
done