Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Join performance

Highlighted

Join performance

New Contributor

There are 3 tables.

client table only having 300 records. I need this table to get client name. person_detail table is having 180 million records and partition on client_id field. person_detail table is parquet. emp_cat table has 1000 records. Need to put distinct as this table has dups.

Insert overwrite directory ‘/apps/dev/report_output.txt’ select distinct report_query1.Client, report_query1.Name, report_query1.Age, report_query1.Address, Query1.category from (select distinct B.clientid Client, A.client_name Name, B.age, B.address, B.cat_id from client A JOIN (select distinct client_id client_id, age age, address address, cat_id cat_id from person_detail where client_id = ‘5454’ and entry_date > ‘2015-01-01’ and entry_date < ‘2015-03-31’) B on (A.clientid = B.clientid) ) report_query1 LEFT OUTER JOIN (select distinct client_id, cat_id, category from emp_cat) Query1 on report_query1.cat_id = Query1.cat_id and report_query1.client_id = Query1.client_id

Question: I am not able to read explain plan for this table. Can someone help. This query output is coming in 30 mins. How can i improve performance. Can I make all 3 tables parquet. I have to store data in some file. Is it right way or do we have some better way. I have fetched data for person_detail table in subquery as this table is partitioned on client id and it will reduce the data for join. Is it right? What else I can do to improve performance. I have kept big table in right side for mapside join.

3 REPLIES 3

Re: Join performance

@sandeep agarwal

This is good starting point on hive tuning http://hortonworks.com/blog/5-ways-make-hive-queri...

Re: Join performance

New Contributor

I am aware of this already. Need to learn performance improvement in hive in depth.

Re: Join performance

Mentor

@sandeep agarwal are you still having issues with this? Can you accept best answer or provide your own solution?