Support Questions

Find answers, ask questions, and share your expertise

Is compression used for Hive temporary tables?

Expert Contributor

Assuming compression is enabled of course.

1 ACCEPTED SOLUTION

Guru

@Terry Padgett These are stored as uncompressed text files.

View solution in original post

6 REPLIES 6

Guru

@Terry Padgett These are stored as uncompressed text files.

Which temporary tables are we talking about?

Tables you create with CREATE TEMPORARY TABLE?

These can have any storage format you want. So you you create it as ORC it definitely WILL be compressed.

Or what do you mean with "compression is enabled" ?

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Tr...

There are also some internal structures for example the dataset that is generated by the Tez job before Hiveserver2 returns it to the client. This can be text or sequence file ( configurable ) but I heard there is a jira to use ORC for it instead.

Expert Contributor

@Benjamin Leonhardi Yes, these are Hive temporary tables. The feature is new'ish and I wanted to know if there are any surprises not mentioned in the language manual. Memory is one of the options for temporary table storage and I want to see if it is possible to fit the tables into memory. The tables are short-lived so I don't think ORC is a realistic choice at the moment but that could change.

What do you mean with memory? As far as I know a temporary table is just like any other table with the one exception that it will be cleaned up when the session ends. So you can choose any storage format but it will be HDFS. So it depends. If you only need it once I would agree ORC is most likely not good but if you create a temp tables once and then query it a couple of times ORC definitely makes sense to me .

Edit: Interesting You could use the HDFS storage policies here. Do you have a cluster that has been setup like this? You could still use any kind of storage you want compressed or not and I still think that ORC will be good if you use your temporary table a couple times.

Starting in Hive 1.1.0 the storage policy for temporary tables can be set to memory, ssd, or default with the hive.exec.temporary.table.storage configuration parameter (see HDFS Storage Types and Storage Policies).

Guru
@Terry Padgett

If you want to store these temporary tables as ORC, it is still possible. Here is an example.

create temporary table tp1 stored as orcfile as select count(*) from table_params;

My earlier answer was whether the text format which is default is compressed on hdfs.

Expert Contributor

Ya @Ravi Mutyala , the temporary tables are only in use for a few minutes. My concern is also about any additional time being spent when writing the table as ORC. Probably have to run a bake off to see how it works in this case.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.