Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to cache the left most table in a left outer join in hive

How to cache the left most table in a left outer join in hive

New Contributor
I have a large table (1Tb of data) that needs to be joined with a smaller table (100k records).

In the above scenario, I am not able to control which table has to be cached into memory.I have tried using MAPJOIN , STREAMTABLE hints and also tried with parameters like conditional task size , small tbale size etc. Since the small table is on the left most side of the join, its not being cached into memory

Is there way to control the table that needs to be cached

Note: I cannot alter the table positions or the code:

SELECT FROM small_table st LEFT JOIN large_table lt        ON =
set hive.execution.engine=tez; set hive.tez.container.size=4096; set hive.merge.mapredfiles=true; set tez.shuffle-vertex-manager.min-src-fraction=0.25; set tez.shuffle-vertex-manager.max-src-fraction=0.75; set hive.exec.dynamic.partition.mode=nonstrict; set ; set -Xmx3200m -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; SET; set;
Don't have an account?
Coming from Hortonworks? Activate your account here