Support Questions

Find answers, ask questions, and share your expertise

Guidance around setting right number for 'hive.exec.max.created.files'

Explorer

Hi, I'm looking for some guidance around setting hive.exec.max.created.files. Is there a formula or ratio to follow ahead of time to find a right number for my queries?

1 ACCEPTED SOLUTION

New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

View solution in original post

5 REPLIES 5

@schauhan I could not find the answer. Really good question. @gopal @yzhang

New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

Rising Star

This is usually a symptom of the problem - general recommendation is to turn on

hive.optimize.sort.dynamic.partition=true;

to prevent partitioning+bucketing from blowing up HDFS file counts.

are you facing this issue, while trying to load data in a large table.

Mentor

@schauhan are you still having issues with this? Can you accept best answer or provide your own solution?