Support Questions

Find answers, ask questions, and share your expertise

Guidance around setting right number for 'hive.exec.max.created.files'

avatar
Contributor

Hi, I'm looking for some guidance around setting hive.exec.max.created.files. Is there a formula or ratio to follow ahead of time to find a right number for my queries?

1 ACCEPTED SOLUTION

avatar
New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@schauhan I could not find the answer. Really good question. @gopal @yzhang

avatar
New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

avatar
Expert Contributor

This is usually a symptom of the problem - general recommendation is to turn on

hive.optimize.sort.dynamic.partition=true;

to prevent partitioning+bucketing from blowing up HDFS file counts.

avatar
Contributor

are you facing this issue, while trying to load data in a large table.

avatar
Master Mentor

@schauhan are you still having issues with this? Can you accept best answer or provide your own solution?