Created 06-14-2016 11:58 AM
Hi All
Hive is creating GB size files in /tmp we are facing size issue because of this.
15.3 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000074_0
15.3 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000075_0
15.2 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000076_0
15.2 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000077_0
15.4 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000078_0
Any help is appreciated .
Thanks in advance
Created 06-14-2016 12:07 PM
The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen.
Please check if applications are running which is generating data. Meanwhile, you can also try compressing the intermediate data by setting the property "hive.exec.compress.intermediate" as true in hive-site.xml.
The related compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.
Hope this helps.
Thanks and Regards,
Sindhu
Created 06-14-2016 12:07 PM
The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen.
Please check if applications are running which is generating data. Meanwhile, you can also try compressing the intermediate data by setting the property "hive.exec.compress.intermediate" as true in hive-site.xml.
The related compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.
Hope this helps.
Thanks and Regards,
Sindhu
Created 06-14-2016 12:22 PM
Thanks for the fast response.
Created 06-15-2016 01:11 PM
@Shihab Hive uses temporary Directory structures both on the node where Hive client is running and the default HDFS instance.
These folders are used to store temp/imtermediary data for each query(as separate files)- gets cleaned up by hive client after a while(configurable) after successful execution of query , But sometimes gets pooled up on client abnormal termination.
One such configurable parameter on HDFS storage is hive.exec.scratchdir (generally set to /tmp/hive)
When writing data to a Hive table/partition, Hive will first write to a temporary location (ie hive.exec.scratchdir) and then move the data to the target table. (The storage could be your underlying filesystem .. could be HDFS (normal case) or S3)
Work around is to clean these directory structure through a cron Job periodically (when size exceeds)