Support Questions
Find answers, ask questions, and share your expertise

Guidance around setting right number for 'hive.exec.max.created.files'

Explorer

Hi, I'm looking for some guidance around setting hive.exec.max.created.files. Is there a formula or ratio to follow ahead of time to find a right number for my queries?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Guidance around setting right number for 'hive.exec.max.created.files'

New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

View solution in original post

5 REPLIES 5

Re: Guidance around setting right number for 'hive.exec.max.created.files'

@schauhan I could not find the answer. Really good question. @gopal @yzhang

Re: Guidance around setting right number for 'hive.exec.max.created.files'

New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

View solution in original post

Re: Guidance around setting right number for 'hive.exec.max.created.files'

Rising Star

This is usually a symptom of the problem - general recommendation is to turn on

hive.optimize.sort.dynamic.partition=true;

to prevent partitioning+bucketing from blowing up HDFS file counts.

Re: Guidance around setting right number for 'hive.exec.max.created.files'

are you facing this issue, while trying to load data in a large table.

Re: Guidance around setting right number for 'hive.exec.max.created.files'

Mentor

@schauhan are you still having issues with this? Can you accept best answer or provide your own solution?