Support Questions
Find answers, ask questions, and share your expertise

impala not using all the nodes

I have a env called ENV-a with 50 GB RAM allocated to impala and i have another env called ENV-b with 92 GB RAM allocated to impala.both have 8 core cpus. impala is installed on 3 nodes on each env.

i am running same query on both ,ENV-a is taking 2 mins to run and consuming 3 GB on each nodes, whereas ENV-b is taking 22 mins to run and taking 70GB memory on 2 machines and 2 GB on 3rd Machine.


after seeing the profile i figured out that ENV-b is using most of the times either 1 nodes or 2 nodes but ENV-a is using all the three nodes. so i guess that may be the problem.

I have done all compute stats and invalidation of the tables but no impact.


what could be the solution ?




Please share the PROFILES from both clusters so that we can have a look to compare.


profile is too big to post here.


I see the following in the SUMMARY:

| | | 48:HASH JOIN 2 18m43s 19m12s 83.47K 5 36.27 GB 893.16 KB INNER JOIN, BROADCAST
| | | |--98:EXCHANGE 2 14s390ms 14s801ms 316.11M 4.25K 0 0 BROADCAST
| | | | 47:HASH JOIN 2 1s492ms 2s660ms 316.11M 4.25K 21.81 MB 3.52 MB INNER JOIN, BROADCAST

This tells me that the JOIN took 19 min max to complete, impala estimated 83.47K to return, but only 5 rows returned. This means either impala has wrong stats info, or impala's estimation was wrong.

I can't see how much data for those tables in the JOIN, but I can see it uses BROADCAST JOIN. I am not sure if SUFFLE JOIN can help here, maybe you can give it a try.

Details can be found here:

 48:HASH JOIN           2   31.973ms   38.336ms   83.47K      66.69K   24.24 MB        3.51 MB  INNER JOIN, BROADCAST    

 This is from Env-a, why there is so much diffrence,see the memory consumption,records and time.

Can there be diffrence in planning of query if impala version are diffrent ? becuase in Env-a i am using impala version 2.1.1 which is faster and in Env-b i am using 2.7.0 which is taking 25 mins to complete and taking 70 GB memory on each machines where as estimates says 500 mb memory required. 

It seems possible that the query plan was different in both versions, but hard to know without seeing the plans.


It's hard to speculate what the difference could be since there's a big different between those versions. We've seen cases in the past where a new join inversion rule in the planner ( caused some queries to run slower - e.g. . It did improve a lot of queries but there were cases where the previous heuristic worked better for specific queries.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.