Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Who agreed with this topic

Impala ODBC performance

Rising Star

Hi,

 we are trying to download a bulk of data from CDH cluster via Windows ODBC Driver for Impala version 2.5.22 to a Windows server.  The ODBC driver works well, but the performance of rows dispatching is really bad - roughly 3M rows/minute.  We checked the possible bottlenecks for this kind of download, but the cluster and also the receiving Windows server were not under load at all, the cpu around 5%, the network cards running on 10Gbit, there are plenty of RAM memory, the target disk where the data is written is RAID-0 SSD with 1GB/s max throughput, so we dont know what component on the trasnfer slows down the records. 

 

We tried to run in multiple parallel threads, what helped a little bit (50% perf increase) but the overall perf is still low..

Also tried to tweak the transfer batch size in ODBC driver, it looks that it doesnt affect the performance at all.

 

The setup is CDH5.3, and Microsoft SQL Server 2014, the Impala is linked via linked server in MS SQL.

 

Any ideas how to increase the transfer speed?

Thanks

 

Tomas

 

 

Who agreed with this topic