Support Questions

King_King · ‎04-07-2016

I am using a query like the following to try to populate it. Note the “limit” statement I have included in the inner query. Without this, I get the familiar “Size of hash cache…exceeds the maximum” error that the USE_SORT_MERGE_JOIN hint is supposed to fix. (Note also that I am using that hint.) However, I need to join to all of the 2,393,244 records in the RPT_LAB_RSLT_PRE2 table to get my full data set. So the limit condition simply demonstrates that my query is good, as with the limit to 500 records, this statement completes successfully and quickly. Without it, it fails as noted above.

Query with Limit condition: (This query works)

UPSERT
INTO MED_DOC_DM.RPT_LAB_RSLT_PRE3
SELECT
/*+ USE_SORT_MERGE_JOIN */
 
rslt.ORD_PROC_ID,
 
rslt.ORD_DT_REAL_NBR,
 
rslt.ORD_RSLT_LN_NBR,
 
rslt.PTNT_ENCNTR_CSN_ID,
 
rslt.PTNT_ID,
 
op2.ORD_PROC_DESCR,
 
rslt.SRVC_AREA_ID,
 
rslt.CMPNT_ID,
 
NULL AS COMPONENT_NM,
 
rslt.RSLT_TSP,
 
rslt.ORD_VAL_MEAS_TXT,
 
rslt.REF_UNIT_TXT,
 
NULL AS CBC_IND,
 
NULL AS HGB_IND,
 
 NULL AS PLT_IND,
 
NULL AS WBC_IND,
 
TO_CHAR(CURRENT_TIME())
FROM
MED_DOC_DM.RPT_LAB_RSLT_PRE1 rslt
JOIN (
SELECT op.ORD_PROC_ID, op.ORD_PROC_DESCR
 
FROM MED_DOC_DM.RPT_LAB_RSLT_PRE2 op
LIMIT 500 )
op2
 
ON rslt.ORD_PROC_ID = op2.ORD_PROC_ID;
The below query doesn't have the LIMIT condition:
0:
jdbc:phoenix:> UPSERT INTO
MED_DOC_DM.RPT_LAB_RSLT_PRE3
. . . . . . . . . . . . . . . .
. . . . . . .> SELECT /*+ USE_SORT_MERGE_JOIN */
. . . . . . . . . . . . . . . .
. . . . . . .>  rslt.ORD_PROC_ID,
. . . . . . . . . . . . . . . .
. . . . . . .> 
rslt.ORD_DT_REAL_NBR,
. . . . . . . . . . . . . . . .
. . . . . . .> 
rslt.ORD_RSLT_LN_NBR,
. . . . . . . . . . . . . . . .
. . . . . . .> 
rslt.PTNT_ENCNTR_CSN_ID,
. . . . . . . . . . . . . . . .
. . . . . . .>  rslt.PTNT_ID,
. . . . . . . . . . . . . . . .
. . . . . . .>  op.ORD_PROC_DESCR,
. . . . . . . . . . . . . . . .
. . . . . . .>  rslt.SRVC_AREA_ID,
. . . . . . . . . . . . . . . .
. . . . . . .>  rslt.CMPNT_ID,
. . . . . . . . . . . . . . . .
. . . . . . .>  NULL AS
COMPONENT_NM,
. . . . . . . . . . . . . . . .
. . . . . . .>  rslt.RSLT_TSP,
. . . . . . . . . . . . . . . .
. . . . . . .> 
rslt.ORD_VAL_MEAS_TXT,
. . . . . . . . . . . . . . . .
. . . . . . .>  rslt.REF_UNIT_TXT,
. . . . . . . . . . . . . . . .
. . . . . . .>  NULL AS CBC_IND,
. . . . . . . . . . . . . . . .
. . . . . . .>  NULL AS HGB_IND,
. . . . . . . . . . . . . . . .
. . . . . . .>  NULL AS PLT_IND,
. . . . . . . . . . . . . . . .
. . . . . . .>  NULL AS WBC_IND,
. . . . . . . . . . . . . . . .
. . . . . . .> 
TO_CHAR(CURRENT_TIME())
. . . . . . . . . . . . . . . .
. . . . . . .> FROM MED_DOC_DM.RPT_LAB_RSLT_PRE1 rslt
. . . . . . . . . . . . . . . .
. . . . . . .> JOIN MED_DOC_DM.RPT_LAB_RSLT_PRE2 op ON rslt.ORD_PROC_ID =
op.ORD_PROC_ID;
Error: Encountered exception in
sub plan [0] execution. (state=,code=0)
java.sql.SQLException:
Encountered exception in sub plan [0] execution.
 
at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:164)
 
at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:119)
 
at org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:670)
 
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:304)
 
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:296)
 
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 
at
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:294)
 
at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1254)
 
at sqlline.Commands.execute(Commands.java:822)
 
at sqlline.Commands.sql(Commands.java:732)
 
at sqlline.SqlLine.dispatch(SqlLine.java:808)
 
at sqlline.SqlLine.begin(SqlLine.java:681)
 
at sqlline.SqlLine.start(SqlLine.java:398)
 
at sqlline.SqlLine.main(SqlLine.java:292)
Caused by:
org.apache.phoenix.join.MaxServerCacheSizeExceededException: Size of hash cache
(104857641 bytes) exceeds the maximum allowed size (104857600 bytes)
 
at org.apache.phoenix.join.HashCacheClient.serialize(HashCacheClient.java:109)
 
at
org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:82)
 
at
org.apache.phoenix.execute.HashJoinPlan$HashSubPlan.execute(HashJoinPlan.java:346)
 
at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:143)
 
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 
at
org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172)
 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 
at java.lang.Thread.run(Thread.java:744)

bleonhardi · ‎04-07-2016

I assume if you run explain that he shows both times that he ignores the hint. I found this link where SquirrelJDBC was removing hints but it looks like you use sqlline and the link said it works there and the usage looks identical to what you do. ( Could you try it once without the upsert to see if that helps? )

https://mail-archives.apache.org/mod_mbox/phoenix-user/201503.mbox/%3cfc15b78a902a1ad5c0d66efc4a5b23...

The second possibility would obviously be to increase the hash cache? 100MB is not that much in todays day and age and 3m rows is not the world.

phoenix.query.maxServerCacheBytes

I hope he is smart enough to only build a cache from the two columns of the right side he actually needs. But I assume since its a description column, that is actually bigger. 200 bytes * 3m rows would be 600m of data.

View solution in original post

thomas_forsyth · ‎04-07-2016

Here is some additional information about the issue Maharaj and I (Tom Forsyth) are sharing.

The table structures involved in the this failing query are provided below. Here are the counts of records in each table.

RPT_LAB_RSLT_PRE1 = 19,729,760

RPT_LAB_RSLT_PRE2 = 2,393,244

RPT_LAB_RSLT_PRE3 = 0 (count prior to upsert attempt)

-- This table is the receiver of the upsert record set.

CREATE TABLE MED_DOC_DM.RPT_LAB_RSLT_PRE3

(

ORD_PROC_ID VARCHAR,

ORD_DT_REAL_NBR VARCHAR,

ORD_RSLT_LN_NBR VARCHAR,

PTNT_ENCNTR_CSN_ID VARCHAR,

PTNT_ID VARCHAR,

ORD_PROC_DESCR VARCHAR,

SRVC_AREA_ID VARCHAR,

CMPNT_ID VARCHAR,

CMPNT_NM VARCHAR,

RSLT_TSP VARCHAR,

ORD_VAL_MEAS_TXT VARCHAR,

REF_UNIT_TXT VARCHAR,

CBC_IND VARCHAR,

HGB_IND VARCHAR,

PLT_IND VARCHAR,

WBC_IND VARCHAR,

ROW_INSERT_TSP VARCHAR

CONSTRAINT PK_RPT_LAB_RSLT_PRE3 PRIMARY KEY( ORD_PROC_ID, ORD_DT_REAL_NBR, ORD_RSLT_LN_NBR ) )

IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY', SALT_BUCKETS = 12;

-- This is the first source table being queried, providing the bulk of the data being populated to the receiver table.

CREATE TABLE MED_DOC_DM.RPT_LAB_RSLT_PRE1

(

ORD_PROC_ID VARCHAR,

ORD_DT_REAL_NBR VARCHAR,

ORD_RSLT_LN_NBR VARCHAR,

PTNT_ENCNTR_CSN_ID VARCHAR,

PTNT_ID VARCHAR,

SRVC_AREA_ID VARCHAR,

CMPNT_ID VARCHAR,

RSLT_TSP VARCHAR,

ORD_VAL_MEAS_TXT VARCHAR,

REF_UNIT_TXT VARCHAR,

ROW_INSERT_TSP VARCHAR

CONSTRAINT PK_RPT_LAB_RSLT_PRE1 PRIMARY KEY( ORD_PROC_ID, ORD_DT_REAL_NBR, ORD_RSLT_LN_NBR ) )

IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY', SALT_BUCKETS = 12;

-- This table is being joined in as the 2nd of the source tables to effectively attach the procedure description to each

-- record from the 1st source table.

CREATE TABLE MED_DOC_DM.RPT_LAB_RSLT_PRE2

(

ORD_PROC_ID VARCHAR,

ORD_PROC_DESCR VARCHAR,

ROW_INSERT_TSP VARCHAR

CONSTRAINT PK_RPT_LAB_RSLT_PRE2 PRIMARY KEY( ORD_PROC_ID ) )

IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY', SALT_BUCKETS = 12;

bleonhardi · ‎04-07-2016

I assume if you run explain that he shows both times that he ignores the hint. I found this link where SquirrelJDBC was removing hints but it looks like you use sqlline and the link said it works there and the usage looks identical to what you do. ( Could you try it once without the upsert to see if that helps? )

https://mail-archives.apache.org/mod_mbox/phoenix-user/201503.mbox/%3cfc15b78a902a1ad5c0d66efc4a5b23...

The second possibility would obviously be to increase the hash cache? 100MB is not that much in todays day and age and 3m rows is not the world.

phoenix.query.maxServerCacheBytes

I hope he is smart enough to only build a cache from the two columns of the right side he actually needs. But I assume since its a description column, that is actually bigger. 200 bytes * 3m rows would be 600m of data.

bleonhardi · ‎04-08-2016

@Maharaj Muthusamy

You need to set the hint after the first statement. I.e. the upsert statement not the select one so it works. Just tried it

This works and results in a sort merge join:

explain upsert /*+ USE_SORT_MERGE_JOIN */ into productsales select productsales.product, productsales.date, productsales.amount  from sales,productsales where sales.product = productsales.product;

This doesn't:

explain upsert  into productsales select /*+ USE_SORT_MERGE_JOIN */ productsales.product, productsales.date, productsales.amount  from sales,productsales where sales.product = productsales.product;

King_King · ‎04-11-2016

Thank you Ben

thomas_forsyth · ‎04-08-2016

Thank you so much, Leonhardi! You rock!

I moved the hint to follow the word "UPSERT" as you recommended, and I can now see the SORT-MERGE reflected in the EXPLAIN PLAN, and the query runs successfully and quickly. Thank you for taking the time to reply and share your wisdom with us!

(I would give you points, but I apparently have no reputation yet and none to give. I will try to remember to come back in and take care of that once I am somebody on this community forum. :>)

bleonhardi · ‎04-08-2016

haha no worries. Thanks for the flowers. Yeah its a bit weird apparently Phoenix takes hints only globally for the whole query ( not for a sub select ) and looks for them after the first main action word.

pminovic · ‎04-10-2016

Hi @Thomas Forsyth, welcome to HCC! You can give Ben points by accepting his answer. There is an "Accept" link below each answer. But in this case only @Maharaj Muthusamy who asked the question can do it.

thomas_forsyth · ‎04-11-2016

Okay. Thank you, Predrag. I will work that out with Maharaj.

King_King · ‎04-11-2016

Thank you very much, Benjamin

Cloudera Community

Support Questions

Phoenix - Query - Error Size of hash cache (104857641 bytes) exceeds the maximum allowed size (104857600 bytes)

NiFi 1.3 - PublishKafka_0_10 - A message in the st...

Zookeeper Sizing and Placement

What are the minimum and maximum cluster sizes?

Spark job fails with below error when byte code gr...

Namenode down due to java.lang.OutOfMemoryError: R...

DATANODE high HEAP SIZE alert

Maximum hive.exec.max.dynamic.partitions allowed &...

NiFi Putkafka processor Maximum allowed data size ...

Nifi PutSNS - Size Exceeds 256KB

Maximum DataNode log size