Archives of Support Questions (Read Only)

g_rao9704 · ‎07-07-2016

ledel · ‎07-08-2016

@sankar rao

you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed.

There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days

You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only:

#!/bin/bash
usage="Usage: dir_diff.sh [days]"

if [ ! "$1" ]
then
  echo $usage
  exit 1
fi

now=$(date +%s)
hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

  if [ $difference -gt $1 ]; then
    hdfs dfs -rm -f $(echo $f | awk '{print $NF}');
  fi
done

View solution in original post

rreddy · ‎07-07-2016

I'm assuming you are referring to /tmp/ directory in hdfs. You can use below command to clean it up and cron it to run every week.

hadoop fs -rm -r /tmp/*

g_rao9704 · ‎07-08-2016

@Rahul Reddy

Thank you so much Rahul...so if i deleted hdfs /tmp directory which is not effect my current jobs?

ledel · ‎07-08-2016

@sankar rao

you shouldn't wipe the entire /tmp directory, this would affect your current jobs indeed.

There's no builtin way to do that but you can cron a job which deletes the files/directories older than x days

You'll find some examples around, here is a shell (dirty but efficient) easy way for cleaning up files only:

#!/bin/bash
usage="Usage: dir_diff.sh [days]"

if [ ! "$1" ]
then
  echo $usage
  exit 1
fi

now=$(date +%s)
hadoop fs -ls -R /tmp/ | grep "^-" | while read f; do
  dir_date=`echo $f | awk '{print $6}'`
  difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))

  if [ $difference -gt $1 ]; then
    hdfs dfs -rm -f $(echo $f | awk '{print $NF}');
  fi
done

Cloudera Community

Archives of Support Questions (Read Only)

Should i need to cleaning up of tmp space in hadoop cluster on weekly basis ? if yes how can i do it? please suggest