About Tim Armstrong

Tim Armstrong · ‎04-21-2020

We did a wholesale revamp of decimal behaviour going from CDH5 to CDH6. The default behaviour all changed in CDH6.0: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.html#decimal_v2 There's a whole epic JIRA capturing the changes: https://issues.apache.org/jira/browse/IMPALA-4072 . I think https://issues.apache.org/jira/browse/IMPALA-4370 might be the specific fix that you're seeing, based on your analysis. The fix version for that change is Impala 2.9.0, so the code change is in CDH5.15.2, but it was done behind the DECIMAL_V2 query option, which wasn't a supported option until CDH6. IN CDH6 you can toggle the behaviour with the DECIMAL_V2 query option (it will eventually be removed, but was kept for backward compatibility).

Tim Armstrong · ‎04-15-2020

Can you provide more information about your version and how the table was created. Ideally "show create table <table>" output. The answer depends a lot on those things because the transactional table support has evolved a lot in recent versions and there are several variants of transactional tables.

Tim Armstrong · ‎03-27-2020

I'm not aware of any plans, we've only been doing maintenance releases on CDH5

Tim Armstrong · ‎03-26-2020

We added Ubuntu 18.04 support in CDH6.2: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_os_requirements.html#c63_supported_os

Tim Armstrong · ‎02-19-2020

We made this stricter because it was easy to create tables with the wrong primary key order, which has perf consequences. It was really a bug that we allowed creating tables with unclear primary key order.

Tim Armstrong · ‎02-13-2020

You need to either change the order of the columns in your table definition or the PRIMARY KEY definition so that they match. In your statement you have the order of MANAGEDOBJECTNAME and SPECIFICPROBLEMSID reversed in the two places. MANAGEDOBJECTNAME, SPECIFICPROBLEMSID, YEARMONTH SPECIFICPROBLEMSID BIGINT, MANAGEDOBJECTNAME STRING NOT NULL, YEARMONTH INT NOT NULL, We made this stricter because it previously silently ignored the order of columns in the PRIMARY KEY clause, which can have really bad performance implications - https://issues.apache.org/jira/browse/IMPALA-8283

Tim Armstrong · ‎01-21-2020

For what it's worth, from an engineering/R&D point of view, more of us are going to be contributing to open source projects than before. All of the core development that was done in the context of open source projects (e.g. the many Apache projects we contribute to) will continue as before. Previously closed source projects are going to be open sourced under the AGPL. The binary distribution does require a subscription (beyond the trial period), similar to how Red Hat does things.

Tim Armstrong · ‎01-21-2020

The estimated stats size is calculated as 400 bytes * # columns * # partitions. The option prevents you from computing incremental stats on tables with too many columns and partitions (it guards against the scenario where memory usage from incremental stats creeps up and up as tables get larger, eventually causing an outage). So you probably want to set it based on the expected size of the largest table that you will be using incremental stats on (that would help prevent someone accidentally computing incremental stats on an even larger table). A few other comments. Generally non-incremental stats will be more robust but we understand that it's sometimes challenging or less practical to do a full compute stats on all tables. So if the calculation above spits out a huge number, you might want to reconsider that. You need to be careful with bumping *only* the catalog heap size. On versions prior to CDH5.16, you need all coordinator impala daemons to have a heap size as large as the catalogd, since the catalog cache is replicated. That was addressed for incremental stats specifically in CDH5.16 by *not* replicating the incremental stats (all other state is still replicated). In CDH5.16 the memory consumption was improved substantially as well (the incremental stats use ~5x less memory). The estimated stats size is actually reduce to 200 bytes * # columns * #partitions.

Tim Armstrong · ‎11-27-2019

I assume you're seeing something like what I attached here. I ran that query from impala-shell and it's going to sit in the FINISHED state for several minutes while the shell fetches all of the ~6 million rows. The reason is that it just takes a bunch of time for the client to fetch the results, even if it's a client like impala-shell or JDBC that actively fetches the results. You'll see this for queries with large result sets, particularly if the connection from client to server is slow or has higher latency. I'd expect that the client will eventually get to the end of the result set and close the query on its own. Hue is a bit different because it only fetches results as needed, so can hold queries open even if they have small result sets. There will be some significant perf and resource management improvements for use cases like this in the versions of Impala that come with CDP - e.g. https://issues.apache.org/jira/browse/IMPALA-8656 helps with this in various ways.

Tim Armstrong · ‎11-06-2019

Also if you have a support contract with Cloudera, this is something they can help you with in more detail through that channel, we've successfully resolved this for customers before.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Wrong results dividing decimal by integer

Re: Impala Query on Hive transactional table retur...

Re: CDH 5.16 on Ubuntu 18

Re: CDH 5.16 on Ubuntu 18

Re: ImpalaRuntimeException: Kudu PRIMARY KEY colum...

Re: ImpalaRuntimeException: Kudu PRIMARY KEY colum...

Re: Is CDP open source?

Re: Impala inc_stats_size_limit_bytes - what does ...

Re: Impala query troubles

Re: impala has invalid file metadata