Support Questions
Find answers, ask questions, and share your expertise

Number of ORC files effect on namnode?

Hi,

For very large datasets in PB range does it help creating large ORC files?

I understand they should be greater than block size.

So lets say I have a block size of 256 mb and am creating 1 GB ORC files for a hive table of total size 3 TB.

So would it help to create bigger file sizes say of 2 GB?

Keep in mind I will be using ORC index to query only 1 file per partition and that data output would be in kb.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Number of ORC files effect on namnode?

Explorer

As a general rule, you should be creating the largest files you can within a partition.

Check out @David Streever's excellent answer to this question for more details.

View solution in original post

2 REPLIES 2

Re: Number of ORC files effect on namnode?

Explorer

As a general rule, you should be creating the largest files you can within a partition.

Check out @David Streever's excellent answer to this question for more details.

View solution in original post

Re: Number of ORC files effect on namnode?

Super Guru