We have a Hive Table stored in ORC format. The query:
select * from myTable where field = 'value'
executes in less than a second, which is great.
However when try to load the data using a JDBC recordset it takes ~50 secs to respond with data when we have a small number of rows (<1000). The smaller the recordset the worse the performance is...
while rs.Next() <-- this takes ~50 secs on the first call
...load the data
This behavior is limited to ORC tables only and doesn't seem to be influenced by compression (we've tried both SNAPPY and ZLIB).
Any ideas what might be happening here? Are there any tuning options which we could employ here?
I've resolved this myself. Application was actually submitting a
select * from myTable where field LIKE = '%value%'
This was of course causing a full table scan.