Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Phoenix query response time slow with secondary index

avatar
New Contributor

Hi I'm using apache phoenix to query sql over hbase.

 

Table schema

CREATE TABLE TABLE_1 (
      SF_ID VARCHAR NOT NULL,
      ENTITY_ID VARCHAR NOT NULL,
      PRODUCT_SKU VARCHAR,
      CITY_NAME VARCHAR,
      SCREEN_NAME VARCHAR, 
      PRODUCT_LIST_VIEWS BIGINT,
      PRODUCT_LIST_CLICKS BIGINT,
      PRODUCT_LIST_CTR FLOAT,
      TIMESTAMP BIGINT NOT NULL,
      POS INTEGER NOT NULL,
      CONSTRAINT pk PRIMARY KEY (SF_ID, ENTITY_ID, TIMESTAMP, POS));

 

I have created secondary index as follows:-

CREATE INDEX GA_2 ON TABLE_1 (ENTITY_ID) INCLUDE (PRODUCT_LIST_VIEWS, PRODUCT_LIST_CLICKS, PRODUCT_LIST_CTR);

 

But the following query is taking around 1.5s to 2s when running on 500k rows.

select ENTITY_ID as "entityId", sum(PRODUCT_LIST_VIEWS) as "productViewSum", sum(PRODUCT_LIST_CLICKS) as "productClickSum", sum(PRODUCT_LIST_CTR) as "productCTRSum" from "TABLE_1" group by ENTITY_ID;

 

Explain plan is as following:-

CLIENT 1-CHUNK 0 ROWS 0 BYTES PARALLEL 1-WAY FULL SCAN OVER GA_2 SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ENTITY_ID"]

 

Is there any way to optimize the response time of query?

1 REPLY 1

avatar
New Contributor

Update :-

Going by explain plan I created a salted table with 12 buckets.

Now the explain plan is following :-

 

 

 

+-------------------------------------------------------------------+----------+
|                               PLAN                                | EST_BYTE |
+-------------------------------------------------------------------+----------+
| CLIENT 12-CHUNK PARALLEL 12-WAY FULL SCAN OVER GA_3               | null     |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ENTITY_ID"]  | null     |
| CLIENT MERGE SORT                                                 | null     |
+-------------------------------------------------------------------+----------+

 

 

 

But response time is still same.

 

Observed one more thing :-

If I don't use sum in my query response is fairly fast. 

For example 

 

select ENTITY_ID, SUM(PRODUCT_LIST_VIEWS) from GA_TABLE_2 where SF_ID = '1' group by ENTITY_ID;

 

this query took 631ms

 

but 

 

select ENTITY_ID from GA_TABLE_2 where SF_ID = '1' group by ENTITY_ID;

 

this took only 30 ms.

 

I'm new to apache phoenix. It would be really helpful if someone can help out.