- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
hadoop /tmp files deletion confirmation ,can i delete below hadoop tmp which occupy 8TB?
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Cloudera Hue
Created ‎08-04-2016 01:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hdfs@ABCHADOOP1-15-2:/root> hadoop fs -du -h /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086
7.9 T /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001
hdfs@ABCHADOOP1-15-2:/root> hadoop fs -du -h /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001
687.8 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000000_0 687.4 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000001_0 686.9 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000002_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000003_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000004_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000005_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000006_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000007_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000008_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000009_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000010_0 653.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000011_0 73.0 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000012_0 34.2 G /tmp/hive-beeswax-abic720prod/hive_2016-04-12_10-09-43_383_5647515039912810955-1086/-ext-10001/000013_0
Created ‎08-04-2016 02:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The files in /tmp are used as a temporary staging location while jobs are running. In my experience, if all of your jobs have completed and the files are dated older than a day or two from "now", then you can delete those files without issue.
Created ‎08-04-2016 02:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The files in /tmp are used as a temporary staging location while jobs are running. In my experience, if all of your jobs have completed and the files are dated older than a day or two from "now", then you can delete those files without issue.
Created ‎08-04-2016 02:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks you so much..now only i got exact answer for my question since long i am waiting for this confirmation .But i have small concern ,according my understanding when jobs are completed the temp files are automatically deleted ,why this files there in tmp files? can please explain in details.
Created ‎08-04-2016 03:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The file stored in /tmp should be automatically removed when the job finishes. However, if the job does not finish properly (due to an error or some other problem), the files may not always be deleted.
See here: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:
- On the HDFS cluster this is set to /tmp/hive-<username> by default and is controlled by the configuration variable hive.exec.scratchdir
- On the client machine, this is hardcoded to /tmp/<username>
Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS.
Created ‎08-05-2016 09:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks you.i got some sense..i am trying to deleted my old files manually...can you suggest me any script which is automatically deleted old files hadoop /tmp..i know that ,script is there for linux tmp files..is there same like script for hdfs tmp files?
last thing from side is
I could not see below property in hive-site.xml in hdp 2.1.2
<property> <name>hive.exec.scratchdir</name> <value>/tmp/mydir</value> <description>Scratch space for Hive jobs</description> </property>
Created ‎08-05-2016 07:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not aware of an existing script already in HDP to do this for you. However, I did run across this:
https://github.com/nmilford/clean-hadoop-tmp
Note, that script is written in ruby. You could follow the logic an write it in Python, Perl or Bash.
