We have a strange issue going on in our 5 node cluser (256 gb RAM, with 130 gb assigned to Imapala in each node).
We have been unable to run a particular query that is joining quite a few tables. It also creates 5 temporary tables that is ultimately joins to create a final query output that is supposed to be passed back to Informatica BDM.
But as soon as the query is sent from Informatica, it finishes running and returns no output. This behaviour is not observed in our pre-prod environment, but that again does not have that much data. What our Cloudera admin can see is that the query is coming from Informatica, but he sees a LIMIT 0 added to the end, which we do not have in Informatica mapping. That then likely causes no output to be returned.
He sees similar thing for a different query from a different mapping, but in that case, as second query is seen without LIMIT 0 and we do get data returned for that Informatica mapping.
So we are are just wondering if we have a cluster configuration issue or does Impala issue a limit 0 query first to get the EXPLAIN plan and then issues the real query, but in this case, it is failing to do so for some reason.
When I ran the EXPLAIN command around the complex query from our query tool (TOAD Data point connected via Impala ODBC Driver), I get this
Per-Host Resource Reservation: Memory=3.85GB
Per-Host Resource Estimates: Memory=9.83GB
I do not have expertise to understand rest of the 1000 line EXPLAIN commands output to understand what is the total Impala RAM that this query is demanding.
Any pointers from the community experts is much appreciated
That query is generated by the Impala JDBC driver, to figure out column types and names. You can disable the feature that generates it with the PreparedMetadataLimitZero flag - see https://www.cloudera.com/documentation/other/connectors/impala-jdbc/latest/Cloudera-JDBC-Driver-for-...
It sounds like the limit 0 query isn't the problem though and is probably a red herring - what are the symptoms you're seeing otherwise? An error?
Do you really only have an 130MB Impala daemon memory limit? Or did you mean 130GB? You're not going to be able to do much useful with that little memory. We recommend 32GB as a minimum: https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.h...
My applogies, it is infact 130GB on each node allocated to Impala(I updated the original post), so we have 650gb over all when you add up memory available in each node.
We have infact being asked by Informatica to switch from JDBC to ODBC driver when connecting BDM to CDH due to timeout issues we were facing where tasks sent over to CDH were taking long time.
We are not seeing any other errors when we issue this particularly long query to Impala. It executes unusually fast and returns no data when we know it should return several rows back.
What are the performance implications of turning this PreparedMetaLimitZero flag off in the JDBC Driver setup ?
Guide says....LIMIT 0 will increase performance, so if we turn if off, then is it a big hit ?
This property specifies whether the PreparedStatement.getMetadata() call will request
metadata from the server with LIMIT 0, increasing performance.
l 1: The PreparedStatement.getMetadata() call uses LIMIT 0.
l 0: The PreparedStatement.getMetadata() call does not use LIMIT 0.
Also, what are the situations when one would use ODBC driver over JDBC driver ?
ODBC and JDBC drivers are both implemented by Simba and have broadly the same functionality. The main difference is just support - most clients support one or the other (JVM-based clients use JDBC generally, and other languages use ODBC).
I'm not sure about the exact performance implications. It doesn't make a difference on the Impala server-side but there may be some JDBC/ODBC client implementation details. I'd recommend just trying it out and measuring if there's a difference.
I don't think we have enough information to really understand why the query isn't returning a result. Usually we'd look at a query profile to see what the query actually did and if there were any errors.
Thanks for the clarification regarding driver differences.
We made some modifications to our query and split it into 4 different queries and it seems to be working now when testing the query directly. We will now implement the same in Informatica BDM and will also be switching from JDBC to ODBC Driver.
I hope we will not encounter the LIMIT 0 only query issue. I just read the ODBC Driver install guide and looks like this parameter is not available when using ODBC driver and is a JDBC driver only feature. We have installed this driver on a Linux box.
We're having a similar issue. I don't see a way to disable the LIMIT 0 being appended when using the ODBC driver. Is there any way of disabling this?