10-04-2017 03:22 AM
I have a env called ENV-a with 50 GB RAM allocated to impala and i have another env called ENV-b with 92 GB RAM allocated to impala.both have 8 core cpus. impala is installed on 3 nodes on each env.
i am running same query on both ,ENV-a is taking 2 mins to run and consuming 3 GB on each nodes, whereas ENV-b is taking 22 mins to run and taking 70GB memory on 2 machines and 2 GB on 3rd Machine.
after seeing the profile i figured out that ENV-b is using most of the times either 1 nodes or 2 nodes but ENV-a is using all the three nodes. so i guess that may be the problem.
I have done all compute stats and invalidation of the tables but no impact.
what could be the solution ?
10-04-2017 04:10 AM
10-04-2017 05:29 AM
48:HASH JOIN 2 31.973ms 38.336ms 83.47K 66.69K 24.24 MB 3.51 MB INNER JOIN, BROADCAST
This is from Env-a, why there is so much diffrence,see the memory consumption,records and time.
10-04-2017 06:50 AM
Can there be diffrence in planning of query if impala version are diffrent ? becuase in Env-a i am using impala version 2.1.1 which is faster and in Env-b i am using 2.7.0 which is taking 25 mins to complete and taking 70 GB memory on each machines where as estimates says 500 mb memory required.
10-04-2017 02:46 PM
It seems possible that the query plan was different in both versions, but hard to know without seeing the plans.
It's hard to speculate what the difference could be since there's a big different between those versions. We've seen cases in the past where a new join inversion rule in the planner (https://issues.apache.org/jira/browse/IMPALA-3828) caused some queries to run slower - e.g. https://issues.apache.org/jira/browse/IMPALA-4040 https://issues.apache.org/jira/browse/IMPALA-5612 https://issues.apache.org/jira/browse/IMPALA-4174 . It did improve a lot of queries but there were cases where the previous heuristic worked better for specific queries.