Support Questions

Find answers, ask questions, and share your expertise

Hive creating huge Temp files in HDFS

avatar
Rising Star

Hi All

Hive is creating GB size files in /tmp we are facing size issue because of this.

15.3 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000074_0

15.3 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000075_0

15.2 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000076_0

15.2 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000077_0

15.4 G /tmp/hive/hive/e9f9943b-8b35-466a-9d61-17e8a86339f1/hive_2016-06-09_19-00-01_169_2126725244382661354-1/-mr-10001/.hive-staging_hive_2016-06-09_19-00-01_169_2126725244382661354-1/-ext-10002/000078_0

Any help is appreciated .

Thanks in advance

1 ACCEPTED SOLUTION

avatar
@Shihab

The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen.

Please check if applications are running which is generating data. Meanwhile, you can also try compressing the intermediate data by setting the property "hive.exec.compress.intermediate" as true in hive-site.xml.

The related compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.

Hope this helps.

Thanks and Regards,

Sindhu

View solution in original post

3 REPLIES 3

avatar
@Shihab

The temp tables are created during the application run as intermediate data. These intermediate tables will not be removed in case the application fails and cleanup does not happen.

Please check if applications are running which is generating data. Meanwhile, you can also try compressing the intermediate data by setting the property "hive.exec.compress.intermediate" as true in hive-site.xml.

The related compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.

Hope this helps.

Thanks and Regards,

Sindhu

avatar
Rising Star

Thanks for the fast response.

avatar
Expert Contributor

@Shihab Hive uses temporary Directory structures both on the node where Hive client is running and the default HDFS instance.

These folders are used to store temp/imtermediary data for each query(as separate files)- gets cleaned up by hive client after a while(configurable) after successful execution of query , But sometimes gets pooled up on client abnormal termination.

One such configurable parameter on HDFS storage is hive.exec.scratchdir (generally set to /tmp/hive)

When writing data to a Hive table/partition, Hive will first write to a temporary location (ie hive.exec.scratchdir) and then move the data to the target table. (The storage could be your underlying filesystem .. could be HDFS (normal case) or S3)

Work around is to clean these directory structure through a cron Job periodically (when size exceeds)