I am using Hive 1.2 on Horton-works cluster which is 1200 node, I have to create a day level partitioned table which is of size 30TB from source, I opted for ORC, which ends up creating too many part files in a day level partition data is evenly partitioned across the file (not 256MB HDFS block size) even after setting these Hive parameters (
Need your input is this a bug in Hive 1.2 in Hortonworks ? why ORC files are not getting compressed ?
@Vamsi Jonnadula I am not sure if it is just me but I am unable to see part of your question. it comes up in a table structure. Any way you can edit your question and remove the table structure the question is in..or is it just me?
@Vamsi Jonnadula, Did this get resolved? We're seeing issues with small ORC files within a partition as well.
I am also seeing the same issue however , after having some config parameters issue was solved , but when i rerun the same hql for same data twice ( in one run it did not generate small files another run it generating small files ), it's so inconsistent to arrive for a conclusion better horton works looks into it
Thank you for the reply , but i have these in place and weird part of it when i run it first time it works fine, don't see small files , when i execute the same job again ( no changes even on data set) i see small files were created