- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN
- Labels:
-
Apache Impala
-
Cloudera Manager
Created on ‎10-21-2017 10:27 AM - edited ‎09-16-2022 05:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
In NameNode when I run the query via odbc script (php/perl or python), I can fetchAll results (9.2M) in a variable in about 30 seconds, but when I tried with the same script/query on another remote 2 servers, the execution time was in first server 28 min and in second 17 min.
To exclude the assuming that it's a network speed issue, I fetch the result on a file and then I copied it with scp command to the first remote server, and it finished in ~40 seconds.
What I observe in Query info in CM that there is a big different between the *Threads: Network Send Wait Time* values in the 3 queries :
in NN query: 9.40s
in 1st remote server: 16.7m
in 2nd remote server: 26.8m
And also I try an java script with Impala JDBC, but the results in NN are already not stimulate to continue.
But so far I cann't find where is the problem and how can I resolve it.
NB: I'm working on CDH 5.12.0/Impala 2.9.0, I installed Impala ODBC 2.5.37.1014.
Hope you can respode to me ASAP, because this issue is a realy an obstacle of using a cluster that we made a several months to make it.
Thanks in advance.
Created ‎04-18-2018 02:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Finaly and after almost 6 months I have found the solution!
It was always about my 1024 limitition remark, the row batch limitation was from BATCH_SIZE max value (1024), in the last versions (CDH 5.14/Impala 2.11) we have a new effective range is 1-65536.
1-1024: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_batch_size.html
1-65536: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/impala_batch_size.html
So when I increase it throgh a odbc.ini with SSP_BATCH_SIZE I can benifit from increasing the other odbc parameters (RowsFetchedPerBlock / TSaslTransportBufSize) and the rows can be fetched in a seconds (~45 secs) instead of tens of minutes.
Remark: I have been recreated the cluster in 3 different server providers and tested the connections from almost 5 others with different ODBC/JDBC releases etc.. and always I have the same slowness until this update came.
I can not understand why I'm the only on declared this big issue and why no one can answer me, kowning that it's realy depressed to have a good quering engine but a veeeery slow fetch rows!
Any way, thanks all for your replies.
Created ‎10-25-2017 03:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is to confirm if there are speed differences between within the cluster and across to the outside of cluster. If there is, how much different.
Created ‎10-25-2017 05:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks to your reply @EricL
I fetched same result of query on a file (about 500Mb), then I copied it with scp command to the first remote server, and it finished in ~40 seconds.
To elimine the probability that this issue is caused by a network speed, I have been copied the same results (9.2M) in a postgres table and test the fetchAll with postgres driver via unixODBC and it takes about 50 secs.
Created ‎10-25-2017 06:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
See Doc below:
http://www.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for-I...
and search for RowsFetchedPerBlock.
Created ‎10-26-2017 05:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @EricL
I already increase the RowsFetchedPerBlock and also TSaslTransportBufSize and no improvement :'(
In Additionally, the test of ODBC with postgresql driver is runed with the same value of this proprieties and give a good results (50 secs).
Created on ‎10-31-2017 03:22 AM - edited ‎10-31-2017 03:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Remark: I observe in ClouderaODBCDriverforImpala_connection_0.log file, that the PrepareTResultForDataRetrieval in the 2 cases repeats ~150 times, even if the RowsFetchedPerBlock is 10000 (it look like it fixed on 1024) !!!
Created ‎10-31-2017 08:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎10-31-2017 09:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If PROFILE and log files are too big, are you able to share through Dropbox or other online tools? They should help us understand more on the issue.
Created ‎11-01-2017 10:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @EricL,
Here is the profile of query and ODBC log files:
NN - ODBC logs - query 200k - 10s - 1.73s without log
Remote Server - ODBC logs - query 200k - 48s - 41s without log
Thanks in advance.
Created ‎04-18-2018 02:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Finaly and after almost 6 months I have found the solution!
It was always about my 1024 limitition remark, the row batch limitation was from BATCH_SIZE max value (1024), in the last versions (CDH 5.14/Impala 2.11) we have a new effective range is 1-65536.
1-1024: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_batch_size.html
1-65536: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/impala_batch_size.html
So when I increase it throgh a odbc.ini with SSP_BATCH_SIZE I can benifit from increasing the other odbc parameters (RowsFetchedPerBlock / TSaslTransportBufSize) and the rows can be fetched in a seconds (~45 secs) instead of tens of minutes.
Remark: I have been recreated the cluster in 3 different server providers and tested the connections from almost 5 others with different ODBC/JDBC releases etc.. and always I have the same slowness until this update came.
I can not understand why I'm the only on declared this big issue and why no one can answer me, kowning that it's realy depressed to have a good quering engine but a veeeery slow fetch rows!
Any way, thanks all for your replies.
