Created 02-21-2018 06:14 AM
@Pavan Kumar KondaIt depends on lot of constraints like compression, serialization and whether the storage format is splittable etc.
I think ORC is just for hive.
Hey, I am pretty much confused which storage format is suited for which type of data. You said "Parquet is well suited for data warehouse kind of solutions where aggregations are required on certain column over a huge set of data.", But I think its true for ORC too. And As @owen said, ORC contains indexes at 3 levels (2 levels in parquet), shouldn't ORC be faster than Parquet for aggregations.
Only ORC and Parquet have the necessary features
ORC can use predicate pushdown based on either:
Parquet only has min/max. ORC can filter at the file level, stripe level, or 10k row level. Parquet can only filter at the file level or stripe level.
The previous answer mentions some of Avro's properties that are shared by ORC and Parquet: