Created on 03-23-2015 02:16 AM - edited 09-16-2022 02:25 AM
Hi,
we are trying to download a bulk of data from CDH cluster via Windows ODBC Driver for Impala version 2.5.22 to a Windows server. The ODBC driver works well, but the performance of rows dispatching is really bad - roughly 3M rows/minute. We checked the possible bottlenecks for this kind of download, but the cluster and also the receiving Windows server were not under load at all, the cpu around 5%, the network cards running on 10Gbit, there are plenty of RAM memory, the target disk where the data is written is RAID-0 SSD with 1GB/s max throughput, so we dont know what component on the trasnfer slows down the records.
We tried to run in multiple parallel threads, what helped a little bit (50% perf increase) but the overall perf is still low..
Also tried to tweak the transfer batch size in ODBC driver, it looks that it doesnt affect the performance at all.
The setup is CDH5.3, and Microsoft SQL Server 2014, the Impala is linked via linked server in MS SQL.
Any ideas how to increase the transfer speed?
Thanks
Tomas