Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

/tmp space usage going 1 TB while run beeline query

When we run beeline query on big table, the /tmp space is going up to 1 TB and total usage is 3 TB due to replication factor.

can we minimize /tmp space usage?

3 REPLIES 3

Hi @Alpesh Virani!
Usually, the /tmp grows cause of intermediate phases of the job, like you pointed, a big table is consuming a lot of space, and also when you have unfinished/failure jobs, it'll leave its data on /tmp.
You can try to change this by compressing the intermediate data through:
hive.exec.compress.intermediate=true;
tez.runtime.compress=true;
And also choosing a good codec compression like snappy, will help you decrease the tmp dir.
If you have a moment, take a look at these links:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_hive-performance-tuning/content/ch_hive-...
https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-installation/content/ref-ff...

They will show you some good tips like vectorization, mapjoin, CBO, ORC, bucket and so on.

Hope this helps! 🙂

Thanks Vinicius Higa Murakami it is very helpful.

Hi @Alpesh Virani!
Good to know!
Please, if your issue has been solved, I'd kindly ask you to accept this as an answer.
Doing this you'll help other users to find an answer as well!

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.