Support Questions
Find answers, ask questions, and share your expertise

Hive fills up /hadoop/yarn/local

Expert Contributor

I have a big hive query, a MERGE statement with a 50GB source table and a 0 GB destination table.

When running it fails because the partition hosting /hadoop/yarn/local (this dir has its own partition) on one data node is filled up to 90%, ie. the yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage threshold.

I could extend this partition, increase the setting or add another datanode, but the issue is likely to prop again when I need to merge 100GB, and then again for 250GB and so on.

What is the best way to sort this out? I could manually break the query (adding WHERE on dates for instance) but I have the feeling that there should be a cleaner solution.

Context: small (3 datanodes) HDP2.6 cluster on AWS.