Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Guidance around setting right number for 'hive.exec.max.created.files'

avatar
Contributor

Hi, I'm looking for some guidance around setting hive.exec.max.created.files. Is there a formula or ratio to follow ahead of time to find a right number for my queries?

1 ACCEPTED SOLUTION

avatar
New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@schauhan I could not find the answer. Really good question. @gopal @yzhang

avatar
New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

avatar
Expert Contributor

This is usually a symptom of the problem - general recommendation is to turn on

hive.optimize.sort.dynamic.partition=true;

to prevent partitioning+bucketing from blowing up HDFS file counts.

avatar
Contributor

are you facing this issue, while trying to load data in a large table.

avatar
Master Mentor

@schauhan are you still having issues with this? Can you accept best answer or provide your own solution?