For very large datasets in PB range does it help creating large ORC files?
I understand they should be greater than block size.
So lets say I have a block size of 256 mb and am creating 1 GB ORC files for a hive table of total size 3 TB.
So would it help to create bigger file sizes say of 2 GB?
Keep in mind I will be using ORC index to query only 1 file per partition and that data output would be in kb.