Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is compression used for Hive temporary tables?

avatar
Super Collaborator

Assuming compression is enabled of course.

1 ACCEPTED SOLUTION

avatar
Guru

@Terry Padgett These are stored as uncompressed text files.

View solution in original post

6 REPLIES 6

avatar
Guru

@Terry Padgett These are stored as uncompressed text files.

avatar
Master Guru

Which temporary tables are we talking about?

Tables you create with CREATE TEMPORARY TABLE?

These can have any storage format you want. So you you create it as ORC it definitely WILL be compressed.

Or what do you mean with "compression is enabled" ?

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Tr...

There are also some internal structures for example the dataset that is generated by the Tez job before Hiveserver2 returns it to the client. This can be text or sequence file ( configurable ) but I heard there is a jira to use ORC for it instead.

avatar
Super Collaborator

@Benjamin Leonhardi Yes, these are Hive temporary tables. The feature is new'ish and I wanted to know if there are any surprises not mentioned in the language manual. Memory is one of the options for temporary table storage and I want to see if it is possible to fit the tables into memory. The tables are short-lived so I don't think ORC is a realistic choice at the moment but that could change.

avatar
Master Guru

What do you mean with memory? As far as I know a temporary table is just like any other table with the one exception that it will be cleaned up when the session ends. So you can choose any storage format but it will be HDFS. So it depends. If you only need it once I would agree ORC is most likely not good but if you create a temp tables once and then query it a couple of times ORC definitely makes sense to me .

Edit: Interesting You could use the HDFS storage policies here. Do you have a cluster that has been setup like this? You could still use any kind of storage you want compressed or not and I still think that ORC will be good if you use your temporary table a couple times.

Starting in Hive 1.1.0 the storage policy for temporary tables can be set to memory, ssd, or default with the hive.exec.temporary.table.storage configuration parameter (see HDFS Storage Types and Storage Policies).

avatar
Guru
@Terry Padgett

If you want to store these temporary tables as ORC, it is still possible. Here is an example.

create temporary table tp1 stored as orcfile as select count(*) from table_params;

My earlier answer was whether the text format which is default is compressed on hdfs.

avatar
Super Collaborator

Ya @Ravi Mutyala , the temporary tables are only in use for a few minutes. My concern is also about any additional time being spent when writing the table as ORC. Probably have to run a bake off to see how it works in this case.