- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Phoenix query response time slow with secondary index
- Labels:
-
Apache HBase
-
Apache Phoenix
Created on ‎10-28-2019 11:24 AM - edited ‎10-28-2019 11:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi I'm using apache phoenix to query sql over hbase.
Table schema
CREATE TABLE TABLE_1 (
SF_ID VARCHAR NOT NULL,
ENTITY_ID VARCHAR NOT NULL,
PRODUCT_SKU VARCHAR,
CITY_NAME VARCHAR,
SCREEN_NAME VARCHAR,
PRODUCT_LIST_VIEWS BIGINT,
PRODUCT_LIST_CLICKS BIGINT,
PRODUCT_LIST_CTR FLOAT,
TIMESTAMP BIGINT NOT NULL,
POS INTEGER NOT NULL,
CONSTRAINT pk PRIMARY KEY (SF_ID, ENTITY_ID, TIMESTAMP, POS));
I have created secondary index as follows:-
CREATE INDEX GA_2 ON TABLE_1 (ENTITY_ID) INCLUDE (PRODUCT_LIST_VIEWS, PRODUCT_LIST_CLICKS, PRODUCT_LIST_CTR);
But the following query is taking around 1.5s to 2s when running on 500k rows.
select ENTITY_ID as "entityId", sum(PRODUCT_LIST_VIEWS) as "productViewSum", sum(PRODUCT_LIST_CLICKS) as "productClickSum", sum(PRODUCT_LIST_CTR) as "productCTRSum" from "TABLE_1" group by ENTITY_ID;
Explain plan is as following:-
CLIENT 1-CHUNK 0 ROWS 0 BYTES PARALLEL 1-WAY FULL SCAN OVER GA_2 SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ENTITY_ID"]
Is there any way to optimize the response time of query?
Created on ‎11-01-2019 08:58 AM - edited ‎11-01-2019 09:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Update :-
Going by explain plan I created a salted table with 12 buckets.
Now the explain plan is following :-
+-------------------------------------------------------------------+----------+
| PLAN | EST_BYTE |
+-------------------------------------------------------------------+----------+
| CLIENT 12-CHUNK PARALLEL 12-WAY FULL SCAN OVER GA_3 | null |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ENTITY_ID"] | null |
| CLIENT MERGE SORT | null |
+-------------------------------------------------------------------+----------+
But response time is still same.
Observed one more thing :-
If I don't use sum in my query response is fairly fast.
For example
select ENTITY_ID, SUM(PRODUCT_LIST_VIEWS) from GA_TABLE_2 where SF_ID = '1' group by ENTITY_ID;
this query took 631ms
but
select ENTITY_ID from GA_TABLE_2 where SF_ID = '1' group by ENTITY_ID;
this took only 30 ms.
I'm new to apache phoenix. It would be really helpful if someone can help out.
