I have a table A, with age a column in it.
I created a table B in orc with age column sorted.
select * from A where age=60;
select * from B where age=60;
Both reads same amount of data and no difference in time was observed.
Please help me with this.
ORC is columnar format, you will see the advantage if and only if you select selective columns.
The SQL query does a Select * hence the complete row is read and then where cause is applied.
Please try reading selective columns to see the difference.
I didn't understand what you said. I think row groups should be skipped based on max and min stored for them. And also I tried for one selecting one column too. It didn't work. Maybe I'm wrong in my understanding.
And also by mistake I wrote that data read is same, actually I wanted to say that data read from B is actually equal to size of B. No predicate pushdown.