Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Number of ORC files effect on namnode?

Solved Go to solution

Number of ORC files effect on namnode?

Hi,

For very large datasets in PB range does it help creating large ORC files?

I understand they should be greater than block size.

So lets say I have a block size of 256 mb and am creating 1 GB ORC files for a hive table of total size 3 TB.

So would it help to create bigger file sizes say of 2 GB?

Keep in mind I will be using ORC index to query only 1 file per partition and that data output would be in kb.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Number of ORC files effect on namnode?

New Contributor

As a general rule, you should be creating the largest files you can within a partition.

Check out @David Streever's excellent answer to this question for more details.

2 REPLIES 2
Highlighted

Re: Number of ORC files effect on namnode?

New Contributor

As a general rule, you should be creating the largest files you can within a partition.

Check out @David Streever's excellent answer to this question for more details.

Re: Number of ORC files effect on namnode?

Super Guru
Don't have an account?
Coming from Hortonworks? Activate your account here