Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Number of ORC files effect on namnode?

avatar

Hi,

For very large datasets in PB range does it help creating large ORC files?

I understand they should be greater than block size.

So lets say I have a block size of 256 mb and am creating 1 GB ORC files for a hive table of total size 3 TB.

So would it help to create bigger file sizes say of 2 GB?

Keep in mind I will be using ORC index to query only 1 file per partition and that data output would be in kb.

Thanks

1 ACCEPTED SOLUTION

avatar
Explorer

As a general rule, you should be creating the largest files you can within a partition.

Check out @David Streever's excellent answer to this question for more details.

View solution in original post

2 REPLIES 2

avatar
Explorer

As a general rule, you should be creating the largest files you can within a partition.

Check out @David Streever's excellent answer to this question for more details.

avatar
Master Guru