Created 08-04-2016 01:25 PM
hdfs@ABCHADOOP1-15-2:/root> hadoop fs -du -h /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086
7.9 T /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001
hdfs@ABCHADOOP1-15-2:/root> hadoop fs -du -h /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001
687.8 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000000_0 687.4 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000001_0 686.9 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000002_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000003_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000004_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000005_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000006_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000007_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000008_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000009_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000010_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000011_0 73.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000012_0 34.2 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000013_0
Created 08-04-2016 02:02 PM
The files in /tmp are used as a temporary staging location while jobs are running. In my experience, if all of your jobs have completed and the files are dated older than a day or two from "now", then you can delete those files without issue.
Created 08-04-2016 02:02 PM
The files in /tmp are used as a temporary staging location while jobs are running. In my experience, if all of your jobs have completed and the files are dated older than a day or two from "now", then you can delete those files without issue.
Created 08-04-2016 02:06 PM
Thanks you so much..now only i got exact answer for my question since long i am waiting for this confirmation .But i have small concern ,according my understanding when jobs are completed the temp files are automatically deleted ,why this files there in tmp files? can please explain in details.
Created 08-04-2016 03:12 PM
The file stored in /tmp should be automatically removed when the job finishes. However, if the job does not finish properly (due to an error or some other problem), the files may not always be deleted.
See here: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:
Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS.
Created 08-05-2016 09:59 AM
Thanks you.i got some sense..i am trying to deleted my old files manually...can you suggest me any script which is automatically deleted old files hadoop /tmp..i know that ,script is there for linux tmp files..is there same like script for hdfs tmp files?
last thing from side is
I could not see below property in hive-site.xml in hdp 2.1.2
<property> <name>hive.exec.scratchdir</name> <value>/tmp/mydir</value> <description>Scratch space for Hive jobs</description> </property>
Created 08-05-2016 07:11 PM
I'm not aware of an existing script already in HDP to do this for you. However, I did run across this:
https://github.com/nmilford/clean-hadoop-tmp
Note, that script is written in ruby. You could follow the logic an write it in Python, Perl or Bash.