Support Questions

Find answers, ask questions, and share your expertise

hadoop /tmp files deletion confirmation ,can i delete below hadoop tmp which occupy 8TB?

avatar
Expert Contributor

hdfs@ABCHADOOP1-15-2:/root> hadoop fs -du -h /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086

7.9 T /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001

hdfs@ABCHADOOP1-15-2:/root> hadoop fs -du -h /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001

687.8 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000000_0 687.4 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000001_0 686.9 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000002_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000003_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000004_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000005_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000006_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000007_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000008_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000009_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000010_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000011_0 73.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000012_0 34.2 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000013_0

1 ACCEPTED SOLUTION

avatar
Super Guru

@sankar rao

The files in /tmp are used as a temporary staging location while jobs are running. In my experience, if all of your jobs have completed and the files are dated older than a day or two from "now", then you can delete those files without issue.

View solution in original post

5 REPLIES 5

avatar
Super Guru

@sankar rao

The files in /tmp are used as a temporary staging location while jobs are running. In my experience, if all of your jobs have completed and the files are dated older than a day or two from "now", then you can delete those files without issue.

avatar
Expert Contributor

@Michael Young

Thanks you so much..now only i got exact answer for my question since long i am waiting for this confirmation .But i have small concern ,according my understanding when jobs are completed the temp files are automatically deleted ,why this files there in tmp files? can please explain in details.

avatar
Super Guru

@sankar rao

The file stored in /tmp should be automatically removed when the job finishes. However, if the job does not finish properly (due to an error or some other problem), the files may not always be deleted.

See here: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:

  • On the HDFS cluster this is set to /tmp/hive-<username> by default and is controlled by the configuration variable hive.exec.scratchdir
  • On the client machine, this is hardcoded to /tmp/<username>

Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS.

avatar
Expert Contributor

@Michael Young

Thanks you.i got some sense..i am trying to deleted my old files manually...can you suggest me any script which is automatically deleted old files hadoop /tmp..i know that ,script is there for linux tmp files..is there same like script for hdfs tmp files?

last thing from side is

I could not see below property in hive-site.xml in hdp 2.1.2

<property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/mydir</value>
    <description>Scratch space for Hive jobs</description>
  </property>

avatar
Super Guru

I'm not aware of an existing script already in HDP to do this for you. However, I did run across this:

https://github.com/nmilford/clean-hadoop-tmp

Note, that script is written in ruby. You could follow the logic an write it in Python, Perl or Bash.