I am querying a large table that is partitioned on a field called day. If I run a query:
where day in ('2016-04-01', '2016-03-01')
I get many mappers and reducers and the query takes a long time to run.
If, however, I write a query:
where day = '2016-04-01'
or day = '2016-03-01'
I get far less mappers and reducers and the query runs quickly. To me this suggests that indoes not take advantage of partitions in a table. Can anyone confirm this and explain why?
Hive Version: 1.2.1
Hadoop Version: 126.96.36.199-4
I believe the relevant part of the execution plans are...
Using Where or
No filter operator at all
Using Where inFilter Operator
predicate: (day) IN ('2016-04-01', '2016-03-01') (type: boolean)
Statistics: Num rows: 100000000 Data size: 9999999999
The hive docs just say:
'What partitions to use in a query is determined automatically by the system on the basis of where clause conditions on partition columns.'