Hi All,
I am running on a POC environment where there are only one name node and one data node running. Impala daemon is running on data node. Both of the nodes have 128GB memory each. I had set the mem_limit to 60GB.
I had two big tables in Impala. First table (sales) has around 635 million records while second table (product) is around 250000 records. I inner join this 2 tables using a common parameter. The example SQL statement is as the following:
select a.t_date, b.p_date from sales a inner join product b on a.product_id=b.product_id order by a.t_date desc
The above query ran for a long time and after few hours, it still didn't return any result. When i use EXPLAIN, it showed Estimated Per-Host Requirements: Memory=992.03MB VCores=2. I am wondering why it took so long. Is this related to mem_limit settings? How can i tune such query?