Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6055 | 12-18-2020 01:46 PM | |
3932 | 12-16-2020 12:11 PM | |
2787 | 12-07-2020 01:47 PM | |
1989 | 12-07-2020 09:21 AM | |
1277 | 10-14-2020 11:15 AM |
08-15-2020
01:03 PM
I think the reality is now that both are great technologies and the overlap in use cases is pretty big - there are a lot of SQL workloads where either can work. I just wanted to clarify a few points. Impala does support querying complex types from Parquet - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html We also are working on a transparent query retry feature in Impala that should be released soon.
... View more
07-29-2020
10:19 AM
Yes we should be able to prune based on range partitions. https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_kudu.html#kudu_partitioning has some examples of how to set up a table with both range and hash partitions. You can specify arbitrary timestamp ranges for the partitions. You can see in the Impala explain plan if your WHERE predicates were converted into kudu pushdown predicates (they're labelled kudu predicates).
... View more
07-28-2020
10:48 AM
Ahh 5.11, there's been so many Impala improvements since then! This happens when the Impala daemon can't load the initial catalog (i.e. database and table metadata). The catalog and statestore roles are both involved in the catalog loading, so if the impala daemon isn't able to communicate with those roles, or those are not started or healthy then that could lead to these symptoms. You should be able to see in Cloudera Manager if they're started and if there are any warnings or errors being flagged. It might also be just that the catalog is slow to load (maybe there's a lot of metadata or something else is unhealthy). You would need to look at the logs of the impala daemon you're connecting and maybe the catalog to see what it's doing and why its slow. I know this doesn't address your immediate problem, but we've seen a lot of these metadata/catalog problem go away with later versions - CDH5.16 or CDH6+, and particularly by moving to a dedicated coordinator/executor topology - https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/impala_dedicated_coordinator.html.
... View more
07-24-2020
01:21 PM
The row counts reflect the status of the partition or table the last time its stats were updated by "compute stats" in Impala (or analyze in Hive). Or that the stats were updated manually via an alter table. (There are also other cases where stats are updated, e.g. they can be automatically gathered by hive, but those are a few examples). One scenario where this could happen is if a partition was dropped since the last compute stats was run. The stats generally can be out of sync with the # of rows in the underlying table - we don't use them for answering queries, just for query optimization, so it's fine if they're a little inaccurate. If you want to know the accurate counts, you can run queries like select count(*) from table; select count(*) from table where business_date = "13/05/2020" and tec_execution_date = "13/05/2020 20:08;
... View more
07-21-2020
09:36 AM
@hsri it seems like this would merit some more investigation - this was added as a nicety a little while back but it may not be working as expected. If you can reproduce this with a simple query, could you file a bug on Apache Impala? https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
... View more
07-20-2020
05:49 PM
I really would suggest looking at whether the particular feature you want are in CDH6.3.3. We do backport a lot of features. E.g the GPU scheduling features for YARN for Hadoop 3.1 were included in CDH 6.2 https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_620_new_features.html#hadoop_new_620. If the question is whether you can run a non-CDH version of Hadoop, and still be running CDH, then the answer is no. Or if non-CDH releases of Hadoop are supported by Cloudera - also no. We only release and support CDH versions that have been fully integrated and tested against the other CDH components. If the question is whether there is a way to take Apache Hadoop release and deploy it in a Cloudera Manager cluster, then no - it's not packaged in the right way
... View more
05-14-2020
09:44 AM
Hi @parthk ,
I'm happy to see you have found the resolution to your issue. Can you kindly mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future?
Thanks,
Vidya
... View more
05-11-2020
02:50 AM
Thanks a lot TIM ..it now ok
... View more
04-27-2020
08:54 AM
The good news is that is that we shipped date support in Impala in CDP. https://docs.cloudera.com/runtime/7.0.3/impala-sql-reference/topics/impala-date.html
... View more
04-21-2020
01:03 PM
We did a wholesale revamp of decimal behaviour going from CDH5 to CDH6. The default behaviour all changed in CDH6.0: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.html#decimal_v2 There's a whole epic JIRA capturing the changes: https://issues.apache.org/jira/browse/IMPALA-4072 . I think https://issues.apache.org/jira/browse/IMPALA-4370 might be the specific fix that you're seeing, based on your analysis. The fix version for that change is Impala 2.9.0, so the code change is in CDH5.15.2, but it was done behind the DECIMAL_V2 query option, which wasn't a supported option until CDH6. IN CDH6 you can toggle the behaviour with the DECIMAL_V2 query option (it will eventually be removed, but was kept for backward compatibility).
... View more