Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Phoenix query response time slow with secondary index

Highlighted

Phoenix query response time slow with secondary index

New Contributor

Hi I'm using apache phoenix to query sql over hbase.

 

Table schema

CREATE TABLE TABLE_1 (
      SF_ID VARCHAR NOT NULL,
      ENTITY_ID VARCHAR NOT NULL,
      PRODUCT_SKU VARCHAR,
      CITY_NAME VARCHAR,
      SCREEN_NAME VARCHAR, 
      PRODUCT_LIST_VIEWS BIGINT,
      PRODUCT_LIST_CLICKS BIGINT,
      PRODUCT_LIST_CTR FLOAT,
      TIMESTAMP BIGINT NOT NULL,
      POS INTEGER NOT NULL,
      CONSTRAINT pk PRIMARY KEY (SF_ID, ENTITY_ID, TIMESTAMP, POS));

 

I have created secondary index as follows:-

CREATE INDEX GA_2 ON TABLE_1 (ENTITY_ID) INCLUDE (PRODUCT_LIST_VIEWS, PRODUCT_LIST_CLICKS, PRODUCT_LIST_CTR);

 

But the following query is taking around 1.5s to 2s when running on 500k rows.

select ENTITY_ID as "entityId", sum(PRODUCT_LIST_VIEWS) as "productViewSum", sum(PRODUCT_LIST_CLICKS) as "productClickSum", sum(PRODUCT_LIST_CTR) as "productCTRSum" from "TABLE_1" group by ENTITY_ID;

 

Explain plan is as following:-

CLIENT 1-CHUNK 0 ROWS 0 BYTES PARALLEL 1-WAY FULL SCAN OVER GA_2 SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ENTITY_ID"]

 

Is there any way to optimize the response time of query?

1 REPLY 1

Re: Phoenix query response time slow with secondary index

New Contributor

Update :-

Going by explain plan I created a salted table with 12 buckets.

Now the explain plan is following :-

 

 

 

+-------------------------------------------------------------------+----------+
|                               PLAN                                | EST_BYTE |
+-------------------------------------------------------------------+----------+
| CLIENT 12-CHUNK PARALLEL 12-WAY FULL SCAN OVER GA_3               | null     |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ENTITY_ID"]  | null     |
| CLIENT MERGE SORT                                                 | null     |
+-------------------------------------------------------------------+----------+

 

 

 

But response time is still same.

 

Observed one more thing :-

If I don't use sum in my query response is fairly fast. 

For example 

 

select ENTITY_ID, SUM(PRODUCT_LIST_VIEWS) from GA_TABLE_2 where SF_ID = '1' group by ENTITY_ID;

 

this query took 631ms

 

but 

 

select ENTITY_ID from GA_TABLE_2 where SF_ID = '1' group by ENTITY_ID;

 

this took only 30 ms.

 

I'm new to apache phoenix. It would be really helpful if someone can help out.

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here