About Tim Armstrong

Tim Armstrong · ‎10-23-2020

https://issues.apache.org/jira/browse/IMPALA-8454 is the apache impala jira

parthk · ‎10-15-2020

@Tim Armstrong it worked like charm after changing the gcc version. Thanks

khan_parvez · ‎10-15-2020

Tried executing commit() or setting the timeout but no effect: import pypyodbc connection = pypyodbc.connect(DSN="", Schema="dbname", autocommit=True) cursor = connection.cursor() query = """INSERT INTO schema.table VALUES ('val1', 'val2')""" cursor.execute(query) cursor.commit() connection.close()

Tim Armstrong · ‎10-14-2020

On-demand metadata does not exist in C5.14.4. There was a technical preview version in C5.16+ and C6.1+ that had all the core functionality but did not perform optimally for all workloads and had some other limitations. After we got feedback and experience with the feature, we made various tweaks and fixes and in C6.3 we removed the technical preview caveat - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_metadata.html and there and some important tweaks in patch releases after (i.e. 6.3.3). It is enabled by default in the latest versions of CDP. So basically if you want to experiment and see if it meets your needs, CDH5.16+ works, but CDH6.3.3+ or CDP has the latest and greatest.

HanzalaShaikh · ‎10-14-2020

Hi Tim, Your suggestion was very helpful. I have a good understanding now. I am accepting as a solution. I just have one more thing to ask, to fix the issue of the query utilizing the resources it is better to increase the Impala Daemon Memory Limit (mem_limit). what do you suggest?

Fenton · ‎10-14-2020

Thanks for your input. We are running our stage cluster in "non-production mode" using the embedded postgres. This postgres is also used by the two hive metastore servers. The postgres db is hosted on the same node as the other cloudera services. When this host now freezes the impala insert queries freeze as well. We were surprised to see that there seems to be no timeout from the hive metastore servers and their backing db (postgres) and no error either. This probably also happens when backed by an external postgres or mysql database, although not tested by us. I wonder if this might be solved by a newer CDH version. We are currently looking into upgrading and would like to do so for other reasons very much so.

superdupershant · ‎09-28-2020

Hi @PauloRC @Tim Armstrong , This might be a performance regression, but also in general a performance inefficiency with a specific planner data structure. A correctness fix for IMPALA-8386 may have introduced this perf regression in 3.2.1, IMPALA-9358 may resolve this issue, but I don't think it's available in any CDH 6.3 release yet. @PauloRC one thing to try which might mitigate the issue is to run your view query with SET ENABLE_EXPR_REWRITES=false to see if that helps.

pphot · ‎09-23-2020

Thank you for your reply Tim. Just to clarify, security-wise, are we better off with our current configuration (default), with sentry service disabled, or with sentry enabled in testing mode? You mentioned that sentry in testing mode does not authenticate the clients, but in the documentation it is mentioned that testing mode uses weaker authentication mechanisms. We need this in order to prevent our analysts from doing accidental writes, drops, etc. on the data. Our cluster is in a secure isolated environment.

Tim Armstrong · ‎09-21-2020

This is definitely a bug. Thanks for the clear report and reproduction. It's not IMPALA-7957 but is somewhat related. This is new to us so I filed https://issues.apache.org/jira/browse/IMPALA-10182 to track it. It looks like it can only happen when you have a UNION ALL, plus subqueries where the same column appears twice in the select list, plus NULL values in those columns. You can work around the issue by removing the duplicated entries in the subquery select list. E.g. the following query is equivalent and returns the expected results. SELECT MIN(t_53.c_41) c_41, CAST(NULL AS DOUBLE) c_43, CAST(NULL AS BIGINT) c_44, t_53.c2 c2, t_53.c2 c3s0, t_53.c4 c4, t_53.c4 c5s0 FROM ( SELECT t.productsubcategorykey c_41, t.productline c2, t.productsubcategorykey c4 FROM as_adventure.t1 t WHERE true GROUP BY 2, 3 ) t_53 GROUP BY 4, 5, 6, 7 UNION ALL SELECT MIN(t_53.c_41) c_41, CAST(NULL AS DOUBLE) c_43, CAST(NULL AS BIGINT) c_44, t_53.c2 c2, t_53.c2 c3s0, t_53.c5s0 c4, t_53.c5s0 c5s0 FROM ( SELECT t.productsubcategorykey c_41, t.productline c2, t.productsubcategorykey c5s0 FROM as_adventure.t1 t WHERE true GROUP BY 2, 3) t_53 GROUP BY 4, 5, 6, 7;

Tim Armstrong · ‎08-23-2020

You need to cast one of the branches of the else to be a compatible type with the other one. The problem is that both decimal types have the max precision (38) and different scale and neither can be converted automatically to the other without potentially losing precision. A lot of the decimal behaviour such as result types of expressions was changed in CDH6 (and upstream Apache Impala 3.0). https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_decimal.html has a lot of related information.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	140

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Automate Impala views on Hive external table

Re: Impala UDF unable to load the .so file from HD...

Re: Impala query over odbc canceled (don't know th...

Re: Impala - On-demand metadata

Re: Impala query failed

Re: Impala DML frozen on CDH manager frozen - hidd...

Re: Impala 3.2.0 performance degradation while que...

Re: Create Select Only user in HUE / Impala withou...

Re: "union all" dropping records with all null/emp...

Re: an error is reported when impala executes a ca...