Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5964 | 12-18-2020 01:46 PM | |
3885 | 12-16-2020 12:11 PM | |
2737 | 12-07-2020 01:47 PM | |
1958 | 12-07-2020 09:21 AM | |
1256 | 10-14-2020 11:15 AM |
10-23-2020
10:28 AM
1 Kudo
https://issues.apache.org/jira/browse/IMPALA-8454 is the apache impala jira
... View more
10-15-2020
04:13 AM
@Tim Armstrong it worked like charm after changing the gcc version. Thanks
... View more
10-15-2020
03:30 AM
Tried executing commit() or setting the timeout but no effect: import pypyodbc connection = pypyodbc.connect(DSN="", Schema="dbname", autocommit=True) cursor = connection.cursor() query = """INSERT INTO schema.table VALUES ('val1', 'val2')""" cursor.execute(query) cursor.commit() connection.close()
... View more
10-14-2020
11:15 AM
1 Kudo
On-demand metadata does not exist in C5.14.4. There was a technical preview version in C5.16+ and C6.1+ that had all the core functionality but did not perform optimally for all workloads and had some other limitations. After we got feedback and experience with the feature, we made various tweaks and fixes and in C6.3 we removed the technical preview caveat - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_metadata.html and there and some important tweaks in patch releases after (i.e. 6.3.3). It is enabled by default in the latest versions of CDP. So basically if you want to experiment and see if it meets your needs, CDH5.16+ works, but CDH6.3.3+ or CDP has the latest and greatest.
... View more
10-14-2020
07:10 AM
Hi Tim, Your suggestion was very helpful. I have a good understanding now. I am accepting as a solution. I just have one more thing to ask, to fix the issue of the query utilizing the resources it is better to increase the Impala Daemon Memory Limit (mem_limit). what do you suggest?
... View more
10-14-2020
02:26 AM
Thanks for your input. We are running our stage cluster in "non-production mode" using the embedded postgres. This postgres is also used by the two hive metastore servers. The postgres db is hosted on the same node as the other cloudera services. When this host now freezes the impala insert queries freeze as well. We were surprised to see that there seems to be no timeout from the hive metastore servers and their backing db (postgres) and no error either. This probably also happens when backed by an external postgres or mysql database, although not tested by us. I wonder if this might be solved by a newer CDH version. We are currently looking into upgrading and would like to do so for other reasons very much so.
... View more
09-28-2020
07:58 AM
Hi @PauloRC @Tim Armstrong , This might be a performance regression, but also in general a performance inefficiency with a specific planner data structure. A correctness fix for IMPALA-8386 may have introduced this perf regression in 3.2.1, IMPALA-9358 may resolve this issue, but I don't think it's available in any CDH 6.3 release yet. @PauloRC one thing to try which might mitigate the issue is to run your view query with SET ENABLE_EXPR_REWRITES=false to see if that helps.
... View more
09-23-2020
01:31 AM
Thank you for your reply Tim. Just to clarify, security-wise, are we better off with our current configuration (default), with sentry service disabled, or with sentry enabled in testing mode? You mentioned that sentry in testing mode does not authenticate the clients, but in the documentation it is mentioned that testing mode uses weaker authentication mechanisms. We need this in order to prevent our analysts from doing accidental writes, drops, etc. on the data. Our cluster is in a secure isolated environment.
... View more
09-21-2020
09:57 AM
1 Kudo
This is definitely a bug. Thanks for the clear report and reproduction. It's not IMPALA-7957 but is somewhat related. This is new to us so I filed https://issues.apache.org/jira/browse/IMPALA-10182 to track it. It looks like it can only happen when you have a UNION ALL, plus subqueries where the same column appears twice in the select list, plus NULL values in those columns. You can work around the issue by removing the duplicated entries in the subquery select list. E.g. the following query is equivalent and returns the expected results. SELECT
MIN(t_53.c_41) c_41,
CAST(NULL AS DOUBLE) c_43,
CAST(NULL AS BIGINT) c_44,
t_53.c2 c2,
t_53.c2 c3s0,
t_53.c4 c4,
t_53.c4 c5s0
FROM
( SELECT
t.productsubcategorykey c_41,
t.productline c2,
t.productsubcategorykey c4
FROM
as_adventure.t1 t
WHERE
true
GROUP BY
2,
3 ) t_53
GROUP BY
4,
5,
6,
7
UNION ALL
SELECT
MIN(t_53.c_41) c_41,
CAST(NULL AS DOUBLE) c_43,
CAST(NULL AS BIGINT) c_44,
t_53.c2 c2,
t_53.c2 c3s0,
t_53.c5s0 c4,
t_53.c5s0 c5s0
FROM
( SELECT
t.productsubcategorykey c_41,
t.productline c2,
t.productsubcategorykey c5s0
FROM
as_adventure.t1 t
WHERE
true
GROUP BY
2,
3) t_53
GROUP BY
4,
5,
6,
7;
... View more
08-23-2020
02:19 PM
You need to cast one of the branches of the else to be a compatible type with the other one. The problem is that both decimal types have the max precision (38) and different scale and neither can be converted automatically to the other without potentially losing precision. A lot of the decimal behaviour such as result types of expressions was changed in CDH6 (and upstream Apache Impala 3.0). https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_decimal.html has a lot of related information.
... View more