Created 04-07-2016 05:01 PM
I am using a query like the following to try to populate it. Note the “limit” statement I have included in the inner query. Without this, I get the familiar “Size of hash cache…exceeds the maximum” error that the USE_SORT_MERGE_JOIN hint is supposed to fix. (Note also that I am using that hint.) However, I need to join to all of the 2,393,244 records in the RPT_LAB_RSLT_PRE2 table to get my full data set. So the limit condition simply demonstrates that my query is good, as with the limit to 500 records, this statement completes successfully and quickly. Without it, it fails as noted above.
Query with Limit condition: (This query works)
UPSERT INTO MED_DOC_DM.RPT_LAB_RSLT_PRE3 SELECT /*+ USE_SORT_MERGE_JOIN */ rslt.ORD_PROC_ID, rslt.ORD_DT_REAL_NBR, rslt.ORD_RSLT_LN_NBR, rslt.PTNT_ENCNTR_CSN_ID, rslt.PTNT_ID, op2.ORD_PROC_DESCR, rslt.SRVC_AREA_ID, rslt.CMPNT_ID, NULL AS COMPONENT_NM, rslt.RSLT_TSP, rslt.ORD_VAL_MEAS_TXT, rslt.REF_UNIT_TXT, NULL AS CBC_IND, NULL AS HGB_IND, NULL AS PLT_IND, NULL AS WBC_IND, TO_CHAR(CURRENT_TIME()) FROM MED_DOC_DM.RPT_LAB_RSLT_PRE1 rslt JOIN ( SELECT op.ORD_PROC_ID, op.ORD_PROC_DESCR FROM MED_DOC_DM.RPT_LAB_RSLT_PRE2 op LIMIT 500 ) op2 ON rslt.ORD_PROC_ID = op2.ORD_PROC_ID; The below query doesn't have the LIMIT condition: 0: jdbc:phoenix:> UPSERT INTO MED_DOC_DM.RPT_LAB_RSLT_PRE3 . . . . . . . . . . . . . . . . . . . . . . .> SELECT /*+ USE_SORT_MERGE_JOIN */ . . . . . . . . . . . . . . . . . . . . . . .> rslt.ORD_PROC_ID, . . . . . . . . . . . . . . . . . . . . . . .> rslt.ORD_DT_REAL_NBR, . . . . . . . . . . . . . . . . . . . . . . .> rslt.ORD_RSLT_LN_NBR, . . . . . . . . . . . . . . . . . . . . . . .> rslt.PTNT_ENCNTR_CSN_ID, . . . . . . . . . . . . . . . . . . . . . . .> rslt.PTNT_ID, . . . . . . . . . . . . . . . . . . . . . . .> op.ORD_PROC_DESCR, . . . . . . . . . . . . . . . . . . . . . . .> rslt.SRVC_AREA_ID, . . . . . . . . . . . . . . . . . . . . . . .> rslt.CMPNT_ID, . . . . . . . . . . . . . . . . . . . . . . .> NULL AS COMPONENT_NM, . . . . . . . . . . . . . . . . . . . . . . .> rslt.RSLT_TSP, . . . . . . . . . . . . . . . . . . . . . . .> rslt.ORD_VAL_MEAS_TXT, . . . . . . . . . . . . . . . . . . . . . . .> rslt.REF_UNIT_TXT, . . . . . . . . . . . . . . . . . . . . . . .> NULL AS CBC_IND, . . . . . . . . . . . . . . . . . . . . . . .> NULL AS HGB_IND, . . . . . . . . . . . . . . . . . . . . . . .> NULL AS PLT_IND, . . . . . . . . . . . . . . . . . . . . . . .> NULL AS WBC_IND, . . . . . . . . . . . . . . . . . . . . . . .> TO_CHAR(CURRENT_TIME()) . . . . . . . . . . . . . . . . . . . . . . .> FROM MED_DOC_DM.RPT_LAB_RSLT_PRE1 rslt . . . . . . . . . . . . . . . . . . . . . . .> JOIN MED_DOC_DM.RPT_LAB_RSLT_PRE2 op ON rslt.ORD_PROC_ID = op.ORD_PROC_ID; Error: Encountered exception in sub plan [0] execution. (state=,code=0) java.sql.SQLException: Encountered exception in sub plan [0] execution. at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:164) at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:119) at org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:670) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:304) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:296) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:294) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1254) at sqlline.Commands.execute(Commands.java:822) at sqlline.Commands.sql(Commands.java:732) at sqlline.SqlLine.dispatch(SqlLine.java:808) at sqlline.SqlLine.begin(SqlLine.java:681) at sqlline.SqlLine.start(SqlLine.java:398) at sqlline.SqlLine.main(SqlLine.java:292) Caused by: org.apache.phoenix.join.MaxServerCacheSizeExceededException: Size of hash cache (104857641 bytes) exceeds the maximum allowed size (104857600 bytes) at org.apache.phoenix.join.HashCacheClient.serialize(HashCacheClient.java:109) at org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:82) at org.apache.phoenix.execute.HashJoinPlan$HashSubPlan.execute(HashJoinPlan.java:346) at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:143) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
Created 04-07-2016 09:04 PM
I assume if you run explain that he shows both times that he ignores the hint. I found this link where SquirrelJDBC was removing hints but it looks like you use sqlline and the link said it works there and the usage looks identical to what you do. ( Could you try it once without the upsert to see if that helps? )
The second possibility would obviously be to increase the hash cache? 100MB is not that much in todays day and age and 3m rows is not the world.
phoenix.query.maxServerCacheBytes
I hope he is smart enough to only build a cache from the two columns of the right side he actually needs. But I assume since its a description column, that is actually bigger. 200 bytes * 3m rows would be 600m of data.
Created 04-07-2016 07:15 PM
Here is some additional information about the issue Maharaj and I (Tom Forsyth) are sharing.
The table structures involved in the this failing query are provided below. Here are the counts of records in each table.
RPT_LAB_RSLT_PRE1 = 19,729,760
RPT_LAB_RSLT_PRE2 = 2,393,244
RPT_LAB_RSLT_PRE3 = 0 (count prior to upsert attempt)
-- This table is the receiver of the upsert record set.
CREATE TABLE MED_DOC_DM.RPT_LAB_RSLT_PRE3
(
ORD_PROC_ID VARCHAR,
ORD_DT_REAL_NBR VARCHAR,
ORD_RSLT_LN_NBR VARCHAR,
PTNT_ENCNTR_CSN_ID VARCHAR,
PTNT_ID VARCHAR,
ORD_PROC_DESCR VARCHAR,
SRVC_AREA_ID VARCHAR,
CMPNT_ID VARCHAR,
CMPNT_NM VARCHAR,
RSLT_TSP VARCHAR,
ORD_VAL_MEAS_TXT VARCHAR,
REF_UNIT_TXT VARCHAR,
CBC_IND VARCHAR,
HGB_IND VARCHAR,
PLT_IND VARCHAR,
WBC_IND VARCHAR,
ROW_INSERT_TSP VARCHAR
CONSTRAINT PK_RPT_LAB_RSLT_PRE3 PRIMARY KEY( ORD_PROC_ID, ORD_DT_REAL_NBR, ORD_RSLT_LN_NBR ) )
IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY', SALT_BUCKETS = 12;
-- This is the first source table being queried, providing the bulk of the data being populated to the receiver table.
CREATE TABLE MED_DOC_DM.RPT_LAB_RSLT_PRE1
(
ORD_PROC_ID VARCHAR,
ORD_DT_REAL_NBR VARCHAR,
ORD_RSLT_LN_NBR VARCHAR,
PTNT_ENCNTR_CSN_ID VARCHAR,
PTNT_ID VARCHAR,
SRVC_AREA_ID VARCHAR,
CMPNT_ID VARCHAR,
RSLT_TSP VARCHAR,
ORD_VAL_MEAS_TXT VARCHAR,
REF_UNIT_TXT VARCHAR,
ROW_INSERT_TSP VARCHAR
CONSTRAINT PK_RPT_LAB_RSLT_PRE1 PRIMARY KEY( ORD_PROC_ID, ORD_DT_REAL_NBR, ORD_RSLT_LN_NBR ) )
IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY', SALT_BUCKETS = 12;
-- This table is being joined in as the 2nd of the source tables to effectively attach the procedure description to each
-- record from the 1st source table.
CREATE TABLE MED_DOC_DM.RPT_LAB_RSLT_PRE2
(
ORD_PROC_ID VARCHAR,
ORD_PROC_DESCR VARCHAR,
ROW_INSERT_TSP VARCHAR
CONSTRAINT PK_RPT_LAB_RSLT_PRE2 PRIMARY KEY( ORD_PROC_ID ) )
IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY', SALT_BUCKETS = 12;
Created 04-07-2016 09:04 PM
I assume if you run explain that he shows both times that he ignores the hint. I found this link where SquirrelJDBC was removing hints but it looks like you use sqlline and the link said it works there and the usage looks identical to what you do. ( Could you try it once without the upsert to see if that helps? )
The second possibility would obviously be to increase the hash cache? 100MB is not that much in todays day and age and 3m rows is not the world.
phoenix.query.maxServerCacheBytes
I hope he is smart enough to only build a cache from the two columns of the right side he actually needs. But I assume since its a description column, that is actually bigger. 200 bytes * 3m rows would be 600m of data.
Created 04-08-2016 12:29 PM
You need to set the hint after the first statement. I.e. the upsert statement not the select one so it works. Just tried it
This works and results in a sort merge join:
explain upsert /*+ USE_SORT_MERGE_JOIN */ into productsales select productsales.product, productsales.date, productsales.amount from sales,productsales where sales.product = productsales.product;
This doesn't:
explain upsert into productsales select /*+ USE_SORT_MERGE_JOIN */ productsales.product, productsales.date, productsales.amount from sales,productsales where sales.product = productsales.product;
Created 04-11-2016 04:25 PM
Thank you Ben
Created 04-08-2016 04:40 PM
Thank you so much, Leonhardi! You rock!
I moved the hint to follow the word "UPSERT" as you recommended, and I can now see the SORT-MERGE reflected in the EXPLAIN PLAN, and the query runs successfully and quickly. Thank you for taking the time to reply and share your wisdom with us!
(I would give you points, but I apparently have no reputation yet and none to give. I will try to remember to come back in and take care of that once I am somebody on this community forum. :>)
Created 04-08-2016 06:36 PM
haha no worries. Thanks for the flowers. Yeah its a bit weird apparently Phoenix takes hints only globally for the whole query ( not for a sub select ) and looks for them after the first main action word.
Created 04-10-2016 07:29 AM
Hi @Thomas Forsyth, welcome to HCC! You can give Ben points by accepting his answer. There is an "Accept" link below each answer. But in this case only @Maharaj Muthusamy who asked the question can do it.
Created 04-11-2016 01:41 PM
Okay. Thank you, Predrag. I will work that out with Maharaj.
Created 04-11-2016 03:55 PM
Thank you very much, Benjamin