I have a Java application where I execute some queries on Impala through JDBC. Currently I am using version: ImpalaJDBC_126.96.36.1992.
So i read in the documentation that it is possible to control the number of the fetched rows throught the command stmt.setFetchSize(). This is though only a hint to the driver, as the driver may not support it or even ignore it.
The Impala driver logging is enabled on my side and i can see that the property is set, however it does not affect the the round trips to the client, or at least i cant seem to comprehend it if it does. From my understanding the JVM plays also a great role in the fetch process, so I also changed the heap size limits, but there is no difference.
I guess that in the logging I should be able to see something similaro to this:
TRACE 13 com.cloudera.hivecommon.api.HS2ClientWrapper.FetchResults(....
for as many times as I am expecting to have the round trips to Impala, but I get the background thread logging twice every time, no matter what I do.
The max row size is 10.000 as i can see from the log file, so my question here is first of all, if the driver indeed supports this option.
If yes, any hint regarding the tuning would be really helpful.
If you are using JDBC you can pass the params through the connection URL.
I think you will find the solution of your questions in this post:
Thank you for the answer. The problem is that I cannot find the property for the connection string. I already took a look at the post you proposed, but as I am not very experienced with the driver itself, i only tried the suggested parameters, and indeed with RowsFetchedPerBlock i could set the maxRows property, but it didnt do any difference at all.
I am also not quite sure on what to expect from the logs, for example i get:
...TRACE 1 com.cloudera.jdbc.common.SForwardResultSet.getObject(2): +++++ enter +++++
and right after it
TRACE 13 com.cloudera.hivecommon.api.HS2ClientWrapper.CloseOperation(TCloseOperationReq(o...)
1 com.cloudera.jdbc.common.SForwardResultSet.next(): +++++ enter +++++
Apr 25 16:37:08.197 TRACE 1 com.cloudera.hivecommon.dataengine.HiveJDBCResultSet.moveToNextRow(): +++++ enter +++++
Apr 25 16:37:08.197 TRACE 1 com.cloudera.hivecommon.dataengine.HiveJDBCResultSet.hasMoreRows(): +++++ enter +++++
Apr 25 16:37:08.197 TRACE 13 com.cloudera.hivecommon.api.HS2ClientWrapper.send_CloseOperation(): +++++ enter +++++
Apr 25 16:37:08.197 TRACE 1 com.cloudera.jdbc.common.SForwardResultSet.getObject(1): +++++ enter +++++
Apr 25 16:37:08.197 TRACE 1 com.cloudera.jdbc.common.SForwardResultSet.getObject(2): +++++ enter +++++
Apr 25 16:37:08.197 TRACE 1 com.cloudera.jdbc.common.SForwardResultSet.next(): +++++ enter +++++
From the log i can definitely tell that the maxrow, or even the fetch size from the previous try were not honored, because these messages apper only in the cursor is getting closer to the end.
But i dont know what else on the log should imply that the fetch is done according to my indications.
So try to add this params to your connection url:
static final String DB_URL = "jdbc:impala://ip:port;UseSasl=0;AuthMech=3;UID=user;PWD=pwd;UseNativeQuery=1;RowsFetchedPerBlock=50000;BATCH_SIZE=50000;LogLevel=0";
Also try to deactivate the log files (LogLevel=0) because fetch process more slower with logging.