Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Guidance around setting right number for 'hive.exec.max.created.files'

avatar
New Member

Hi, I'm looking for some guidance around setting hive.exec.max.created.files. Is there a formula or ratio to follow ahead of time to find a right number for my queries?

1 ACCEPTED SOLUTION

avatar
New Member

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@schauhan I could not find the answer. Really good question. @gopal @yzhang

avatar
New Member

The main purpose of it is to prevent overloading hdfs? If a query generates more than default #100000, it is better user examine the query and see why so. maybe the query is generating too many too small files.

avatar
Expert Contributor

This is usually a symptom of the problem - general recommendation is to turn on

hive.optimize.sort.dynamic.partition=true;

to prevent partitioning+bucketing from blowing up HDFS file counts.

avatar
New Member

are you facing this issue, while trying to load data in a large table.

avatar
Master Mentor

@schauhan are you still having issues with this? Can you accept best answer or provide your own solution?