I have a views table joining with a temp table with the below parameters intentionally enabled.
hive.auto.convert.join=true;
hive.execution.engine=tez;
The Code Snippet is,
CREATE TABLE STG_CONVERSION AS
SELECT CONV.CONVERSION_ID,
CONV.USER_ID,
TP.TIME,
CONV.TIME AS ACTIVITY_TIME,
TP.MULTI_DIM_ID,
CONV.CONV_TYPE_ID,
TP.SV1
FROM VIEWS TP
JOIN SCU_TMP CONV ON TP.USER_ID = CONV.USER_ID
WHERE TP.TIME <= CONV.TIME;
In the normal scenario, both the tables can have any number of records.
However,in the SCU_TMP table, only 10-50 records are expected with the same User Id.
But in some cases, couple of User IDs come with around 10k-20k records in SCU Temp table, which creates a cross product effect.
In such cases, it'll run for ever with just 1 mapper to complete.
Is there any way to optimise this and run this gracefully?