There are 3 tables.
client table only having 300 records. I need this table to get client name. person_detail table is having 180 million records and partition on client_id field. person_detail table is parquet. emp_cat table has 1000 records. Need to put distinct as this table has dups.
Insert overwrite directory ‘/apps/dev/report_output.txt’ select distinct report_query1.Client, report_query1.Name, report_query1.Age, report_query1.Address, Query1.category from (select distinct B.clientid Client, A.client_name Name, B.age, B.address, B.cat_id from client A JOIN (select distinct client_id client_id, age age, address address, cat_id cat_id from person_detail where client_id = ‘5454’ and entry_date > ‘2015-01-01’ and entry_date < ‘2015-03-31’) B on (A.clientid = B.clientid) ) report_query1 LEFT OUTER JOIN (select distinct client_id, cat_id, category from emp_cat) Query1 on report_query1.cat_id = Query1.cat_id and report_query1.client_id = Query1.client_id
Question: I am not able to read explain plan for this table. Can someone help. This query output is coming in 30 mins. How can i improve performance. Can I make all 3 tables parquet. I have to store data in some file. Is it right way or do we have some better way. I have fetched data for person_detail table in subquery as this table is partitioned on client id and it will reduce the data for join. Is it right? What else I can do to improve performance. I have kept big table in right side for mapside join.