Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hive index not being used

hive index not being used

Master Collaborator

i have created an hive index and I am using the indexed column in the select query but the index is not used .

1) created the index 
create index drqPlzaidx on table drq_base(plaza_id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild;
alter index drqPlzaidx on drq_base rebuild;

2) explain the query that should use index but its not 
    > explain select count(*),plaza_id from drq_base group by plaza_id;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: drq_base
            Statistics: Num rows: 58959872 Data size: 353759232 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: plaza_id (type: varchar(6))
              outputColumnNames: plaza_id
              Statistics: Num rows: 58959872 Data size: 353759232 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: count()
                keys: plaza_id (type: varchar(6))
                mode: hash
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 58959872 Data size: 353759232 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: varchar(6))
                  sort order: +
                  Map-reduce partition columns: _col0 (type: varchar(6))
                  Statistics: Num rows: 58959872 Data size: 353759232 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col1 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          keys: KEY._col0 (type: varchar(6))
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 29479936 Data size: 176879616 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col1 (type: bigint), _col0 (type: varchar(6))
            outputColumnNames: _col0, _col1
            Statistics: Num rows: 29479936 Data size: 176879616 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 29479936 Data size: 176879616 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
Time taken: 0.049 seconds, Fetched: 52 row(s)
hive>

2 REPLIES 2

Re: hive index not being used

Master Collaborator

also I see absolutely no improvement in the query . The table is a textfile table and not an ORC so I was expecting improvement after creating an index

Highlighted

Re: hive index not being used

Master Collaborator

I am still unable to use indexes . .can someone help?

Don't have an account?
Coming from Hortonworks? Activate your account here