Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

tune hive with Tez


tune hive with Tez

I was reading hive performance tuning with tez , in the hortonworks docs below lines are mentioned , I have some doubts on the same .

Using map joins is very efficient because one table (usually a dimension table) is held in memory as a hash map on every node and the larger fact table is streamed. This minimizes data movement, resulting in very fast joins.

  • what does it mean larger fact table is streamed , how does join works in this case ?
  • what is map joins ?
  • How does it minimizes data movement ?
  • difference between map joins and shuffle joins.


Don't have an account?
Coming from Hortonworks? Activate your account here