Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 8901 | 12-18-2020 01:46 PM | |
| 5898 | 12-16-2020 12:11 PM | |
| 4639 | 12-07-2020 01:47 PM | |
| 2797 | 12-07-2020 09:21 AM | |
| 1927 | 10-14-2020 11:15 AM |
04-21-2020
01:03 PM
We did a wholesale revamp of decimal behaviour going from CDH5 to CDH6. The default behaviour all changed in CDH6.0: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.html#decimal_v2 There's a whole epic JIRA capturing the changes: https://issues.apache.org/jira/browse/IMPALA-4072 . I think https://issues.apache.org/jira/browse/IMPALA-4370 might be the specific fix that you're seeing, based on your analysis. The fix version for that change is Impala 2.9.0, so the code change is in CDH5.15.2, but it was done behind the DECIMAL_V2 query option, which wasn't a supported option until CDH6. IN CDH6 you can toggle the behaviour with the DECIMAL_V2 query option (it will eventually be removed, but was kept for backward compatibility).
... View more
04-15-2020
10:03 AM
Can you provide more information about your version and how the table was created. Ideally "show create table <table>" output. The answer depends a lot on those things because the transactional table support has evolved a lot in recent versions and there are several variants of transactional tables.
... View more
03-27-2020
09:39 AM
I'm not aware of any plans, we've only been doing maintenance releases on CDH5
... View more
03-26-2020
09:59 AM
We added Ubuntu 18.04 support in CDH6.2: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_os_requirements.html#c63_supported_os
... View more
02-19-2020
09:26 AM
We made this stricter because it was easy to create tables with the wrong primary key order, which has perf consequences. It was really a bug that we allowed creating tables with unclear primary key order.
... View more
02-13-2020
09:11 AM
1 Kudo
You need to either change the order of the columns in your table definition or the PRIMARY KEY definition so that they match. In your statement you have the order of MANAGEDOBJECTNAME and SPECIFICPROBLEMSID reversed in the two places. MANAGEDOBJECTNAME, SPECIFICPROBLEMSID, YEARMONTH SPECIFICPROBLEMSID BIGINT, MANAGEDOBJECTNAME STRING NOT NULL, YEARMONTH INT NOT NULL, We made this stricter because it previously silently ignored the order of columns in the PRIMARY KEY clause, which can have really bad performance implications - https://issues.apache.org/jira/browse/IMPALA-8283
... View more
01-21-2020
02:16 PM
For what it's worth, from an engineering/R&D point of view, more of us are going to be contributing to open source projects than before. All of the core development that was done in the context of open source projects (e.g. the many Apache projects we contribute to) will continue as before. Previously closed source projects are going to be open sourced under the AGPL. The binary distribution does require a subscription (beyond the trial period), similar to how Red Hat does things.
... View more
01-21-2020
01:45 PM
The estimated stats size is calculated as 400 bytes * # columns * # partitions. The option prevents you from computing incremental stats on tables with too many columns and partitions (it guards against the scenario where memory usage from incremental stats creeps up and up as tables get larger, eventually causing an outage). So you probably want to set it based on the expected size of the largest table that you will be using incremental stats on (that would help prevent someone accidentally computing incremental stats on an even larger table). A few other comments. Generally non-incremental stats will be more robust but we understand that it's sometimes challenging or less practical to do a full compute stats on all tables. So if the calculation above spits out a huge number, you might want to reconsider that. You need to be careful with bumping *only* the catalog heap size. On versions prior to CDH5.16, you need all coordinator impala daemons to have a heap size as large as the catalogd, since the catalog cache is replicated. That was addressed for incremental stats specifically in CDH5.16 by *not* replicating the incremental stats (all other state is still replicated). In CDH5.16 the memory consumption was improved substantially as well (the incremental stats use ~5x less memory). The estimated stats size is actually reduce to 200 bytes * # columns * #partitions.
... View more
11-27-2019
11:18 AM
I assume you're seeing something like what I attached here. I ran that query from impala-shell and it's going to sit in the FINISHED state for several minutes while the shell fetches all of the ~6 million rows. The reason is that it just takes a bunch of time for the client to fetch the results, even if it's a client like impala-shell or JDBC that actively fetches the results. You'll see this for queries with large result sets, particularly if the connection from client to server is slow or has higher latency. I'd expect that the client will eventually get to the end of the result set and close the query on its own. Hue is a bit different because it only fetches results as needed, so can hold queries open even if they have small result sets. There will be some significant perf and resource management improvements for use cases like this in the versions of Impala that come with CDP - e.g. https://issues.apache.org/jira/browse/IMPALA-8656 helps with this in various ways.
... View more
11-06-2019
10:09 AM
Also if you have a support contract with Cloudera, this is something they can help you with in more detail through that channel, we've successfully resolved this for customers before.
... View more