Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Solved Go to solution

Re: Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Master Collaborator

In Impala 2.11 we actually capped the max batch_size setting. Before that you could set it to an arbitrarily high value, which could have strange consequences. It's still a bit of a use-at-your-own-risk setting since it can have consequences for memory consumption and performance.

 

The real fix for this would be https://issues.apache.org/jira/browse/IMPALA-1618. Setting batch_size is just a workaround that may or may not work for you.

Re: Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Expert Contributor

Hi @Tim Armstrong

While IMPALA-1618 steel open and unresolved, I confirmed that this "workaround" is safe and efficient (I'm using it on a large scope and during more than 9 months) so that this is the only solution I find to solve or -get around- this big problem.

Hope that the main problem will be fixed ASAP.
Thanks for the remark.

Re: Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Explorer

Can you tell me the way to set the BATCH_SIZE for impala jdbc connection? I tried but it is not working for me.

Re: Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Expert Contributor

Hi @Bishnup

ConfiguringServer-SideProperties
When connecting to a server that is running Impala 2.0 or later, you can use the driver to apply configuration properties to the server by setting the properties in the connection URL.
https://www.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for-...

Good luck.

Re: Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Explorer

Hi @AcharkiMed

 

I tried setting the Batch size in the connection URL but I didn't get any performance boost in the query fetching time. I have posted my usecase in the cloudera forum. Kindly answer my questions :

 

 

Re: Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Expert Contributor

Hi,

Please try to change all these 3 params:

TSaslTransportBufSize=4000;
RowsFetchedPerBlock=60536;
SSP_BATCH_SIZE=60536;

Re: Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Explorer

Hi @AcharkiMed

 

As you suggested me to set 

TSaslTransportBufSize=4000;
RowsFetchedPerBlock=60536;
SSP_BATCH_SIZE=60536;

in the connection URL. I did the changes but i am getting these errors

java.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/ statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:H Y000, errorMessage:Invalid query option: SSP_BATCH_SIZE
), Query: SET SSP_BATCH_SIZE=60536.
        at com.cloudera.hivecommon.api.HS2Client.executeStatementInternal(Unknow n Source) ~[Impala-JDBC-41-1.0.0.jar!/:na]

 and 

java.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:Invalid query option: TSaslTransportBufSize
), Query: SET TSaslTransportBufSize=4000.

Help me set up the property.

 

Thank You,

Bishnu

Re: Impala ODBC/JDBC bad performance - rows fetch is very slow from a remote server compared with NN

Expert Contributor

Hi @Bishnup

If you still have the same problem please try to share with us your URL string.