Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Guidance around setting right number for 'hive.exec.max.created.files'

Explorer

Hi, I'm looking for some guidance around setting hive.exec.max.created.files. Is there a formula or ratio to follow ahead of time to find a right number for my queries?

1 ACCEPTED SOLUTION

New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

View solution in original post

5 REPLIES 5

@schauhan I could not find the answer. Really good question. @gopal @yzhang

New Contributor

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

Rising Star

This is usually a symptom of the problem - general recommendation is to turn on

hive.optimize.sort.dynamic.partition=true;

to prevent partitioning+bucketing from blowing up HDFS file counts.

are you facing this issue, while trying to load data in a large table.

Mentor

@schauhan are you still having issues with this? Can you accept best answer or provide your own solution?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.