Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5811 | 12-18-2020 01:46 PM | |
3740 | 12-16-2020 12:11 PM | |
2653 | 12-07-2020 01:47 PM | |
1896 | 12-07-2020 09:21 AM | |
1229 | 10-14-2020 11:15 AM |
10-29-2020
05:18 PM
I lost this in my inbox but coming back. GET_COLUMNS does use some of the same machinery as other queries but it's a metadata-only operation on metadata that can be entirely cached. Are you saying it consistently takes 500ms even if you run queries back-to-back? The only thing I can think of is if potentially you have a large number of databases or tables in your catalog. There is a step in the GET_COLUMNS processing where it searches through all the metadata to find something matching the tableName pattern in the request.
... View more
10-23-2020
10:28 AM
1 Kudo
https://issues.apache.org/jira/browse/IMPALA-8454 is the apache impala jira
... View more
10-21-2020
09:49 PM
1 Kudo
I don't have insight into how to solve your particular problem, but for what it's worth, in later versions of Impala (those included in CDP), Impala will read recursively from directories within the table location.
... View more
10-14-2020
11:15 AM
1 Kudo
On-demand metadata does not exist in C5.14.4. There was a technical preview version in C5.16+ and C6.1+ that had all the core functionality but did not perform optimally for all workloads and had some other limitations. After we got feedback and experience with the feature, we made various tweaks and fixes and in C6.3 we removed the technical preview caveat - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_metadata.html and there and some important tweaks in patch releases after (i.e. 6.3.3). It is enabled by default in the latest versions of CDP. So basically if you want to experiment and see if it meets your needs, CDH5.16+ works, but CDH6.3.3+ or CDP has the latest and greatest.
... View more
10-13-2020
03:44 PM
@parthkyeah I'd expect so. Sometimes this C++ inter-version compatibility is a bear.
... View more
10-13-2020
09:25 AM
The client (i.e. the ODBC driver being used by your pyodbc program) is closing the insert operation before it finishes. I.e. it's starting the insert query, then closing the insert query before it's finished. I don't know pyodbc well, but I wonder if it's something to do with how it's being used here. The examples I see either commit or fetch rows after execute(). I'd suggest trying either of those things (calling commit() or fetching from the cursor) to see if it forces your program to wait for the insert to succeed.. https://github.com/mkleehammer/pyodbc/wiki/Getting-started
... View more
10-12-2020
09:46 AM
The ODBC driver uses the column metadata to help implement some parts of the ODBC spec in my understanding. The metadata used by the GET_COLUMNS operation should be cached in Impala's metadata cache, at least in most standard configurations that I can think of. The first GET_COLUMNS on a table could be quite slow, since it'll trigger loading all the table metadata, but after that it should be very fast - 500ms seems very slow for a table with cached metadata. Unless there was something like an "INVALIDATE METADATA" in-between. Can you get a query profile for one of the GET_COLUMNS queries? That would have a timeline of how long the various steps took, like loading table metadata. What version of Impala are you running? Have you got any non-standard configurations (like different catalog modes)?
... View more
10-09-2020
10:28 AM
What OS and compiler version are you using to build the UDF? This looks like it is probably a consequence of it being built with a newer gcc version than the one use to build Impala (gcc 4.9.2)
... View more
10-07-2020
02:51 PM
I should also say - If you have a chance to upgrade your cluster, I think your experience with Impala would be improved quite a lot. The last CDH5 release - 5.16.2 is a big jump in scalability, performance and reliability from 5.10. CDH6.3.3 is a big jump beyond that in terms of features, then CDP is another huge step, particularly for metadata performance.
... View more
10-07-2020
01:44 PM
There's no dependency on any of the Cloudera management services. Inserts are also going to depend on the HDFS service being healthy (i.e. namenodes, data nodes, etc). There are various other underlying services that could be in play - Kerberos infrastructure like the KDC, the KMS if you're using certain encryption features, etc. Those logs look like the client didn't actually close the query, so I'd question whether there was something that disrupted the client connect to the impala daemon (e.g. a load balancer was paused, or something happened to the client process).
... View more