Created 02-16-2017 01:16 PM
Hi,
I tried to update to 5.10 yesterday, and I believe to impala 2.8 (the logs/shell still says it is impala 2.7 for cdh 5.10).
And I got suprised with a big drop in performance for most of my queries.
For queries with no join, using "set mt_dop=10", improved the performance by a lot.
But for all the queries with "mt_dop=0" they got way worst.
In the shell In runned a aggregation query with no join, and it "Fetched 360 row(s) in 100.28s", the summary is the following :
+---------------------+--------+----------+----------+---------+------------+-----------+---------------+---------------------------------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +---------------------+--------+----------+----------+---------+------------+-----------+---------------+---------------------------------------------------+ | 10:MERGING-EXCHANGE | 1 | 150.62us | 150.62us | 360 | 500 | 0 B | -1 B | UNPARTITIONED | | 05:TOP-N | 9 | 137.12us | 218.93us | 360 | 500 | 16.00 KB | 11.72 KB | | | 09:AGGREGATE | 9 | 10.72ms | 14.33ms | 360 | 6.15M | 10.89 MB | 17.20 MB | FINALIZE | | 08:EXCHANGE | 9 | 86.09us | 95.65us | 3.24K | 6.15M | 0 B | 0 B | HASH(`date (hourly)`) | | 04:AGGREGATE | 9 | 364.87ms | 384.52ms | 3.24K | 6.15M | 2.03 MB | 17.20 MB | STREAMING | | 07:AGGREGATE | 9 | 5.27s | 5.52s | 38.07M | 226.95M | 714.15 MB | 18.60 GB | FINALIZE | | 06:EXCHANGE | 9 | 608.64ms | 628.24ms | 115.78M | 226.95M | 0 B | 0 B | HASH(udid,`date (hourly)`) | | 03:AGGREGATE | 9 | 17.56s | 24.24s | 115.78M | 226.95M | 2.63 GB | 18.60 GB | STREAMING | | 00:UNION | 9 | 1.67s | 2.47s | 226.95M | 226.95M | 1.93 MB | 0 B | | | |--02:SCAN HDFS | 9 | 543.42us | 602.49us | 0 | 0 | 0 B | 0 B | pocketgems_prod.customevent_chapterview_streaming | | 01:SCAN HDFS | 9 | 291.51ms | 1.09s | 226.95M | 226.95M | 42.01 MB | 1.29 GB | pocketgems_prod.customevent_chapterview_batch | +---------------------+--------+----------+----------+---------+------------+-----------+---------------+---------------------------------------------------+
It use to take around 35 seconds.
I also downloaded the profile : https://gist.github.com/momohuri/c03683cd4263f48c1de5afd314d2662f
The thing that surprised me is this RowsReturnedRate: 3.
Any clue of why is it happening? For now I went back to cdh 5.9.1
Thanks
Created 02-16-2017 02:37 PM
Thanks for your report! Definitely want to look into this. Could you also provide a profile of that same query on 5.9.1?
Created 02-16-2017 03:41 PM
Unfortunaly we inserted a lot of data in those partitions since yesterday... And I didn't downloaded the profile when I did my test on cdh 5.9.1, I just noted the time taken.
Created 03-01-2017 07:31 PM
Hi,
I have setup a new cluster with pretty much the same coniguration as the prod, and similar amount of machines.
The new cluster is using impala 5.10.0 the old one is using 5.9.1.
This exemple is not as bad as I saw before but still...
implala 2.7 :79.49s
https://gist.github.com/momohuri/38e5cce6d4f4dc1c45ac6db18fbc1a82
impala 2.8 : 129.59s
https://gist.github.com/momohuri/9544c5a97e9ec40ea1ec71caf1f5a030
query 2:
impala 2.7 : 62s
https://gist.github.com/momohuri/c11f5cc7dc336af5ad1b1b605c523a1a
impala 2.8: 111s
https://gist.github.com/momohuri/81586f032e24c3c530e49da75816acd3
The main difference that I see is the amount of hosts. in 2.8 it is only using 3 hosts, but there are 8 avaible. Is there a reasons for that?
Is there something else that I am missing ?
Created 03-02-2017 02:27 PM
Looks like using fewer nodes is the root cause.
Are you sure the data is spread among all datanodes?
Are you sure all impalads are up and registered with the statestore? You can look at the statestore web ui to see all the subscribers.
You can also try a "explain select count(*)" from all your tables and see on how many nodes the estimates you will run on. You'll need to "set explain_level=2" before running the explain to get the num_hosts field.
Created 03-07-2017 12:27 PM
Hi,
actuay after investigation the problem was completely unraleted to impala... One of the machines of our cluster was having a 100mbps ethernet cable instead of a 10gbps...
thanks for your help
Created 03-07-2017 02:14 PM
Glad you found the problem! Thanks for following up with your findings.