Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Number of ORC files effect on namnode?

Hi,

For very large datasets in PB range does it help creating large ORC files?

I understand they should be greater than block size.

So lets say I have a block size of 256 mb and am creating 1 GB ORC files for a hive table of total size 3 TB.

So would it help to create bigger file sizes say of 2 GB?

Keep in mind I will be using ORC index to query only 1 file per partition and that data output would be in kb.

Thanks

1 ACCEPTED SOLUTION

Explorer

As a general rule, you should be creating the largest files you can within a partition.

Check out @David Streever's excellent answer to this question for more details.

View solution in original post

2 REPLIES 2

Explorer

As a general rule, you should be creating the largest files you can within a partition.

Check out @David Streever's excellent answer to this question for more details.

Super Guru
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.