About Tim Armstrong

Tim Armstrong · ‎08-15-2020

I think the reality is now that both are great technologies and the overlap in use cases is pretty big - there are a lot of SQL workloads where either can work. I just wanted to clarify a few points. Impala does support querying complex types from Parquet - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html We also are working on a transparent query retry feature in Impala that should be released soon.

Tim Armstrong · ‎07-29-2020

Yes we should be able to prune based on range partitions. https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_kudu.html#kudu_partitioning has some examples of how to set up a table with both range and hash partitions. You can specify arbitrary timestamp ranges for the partitions. You can see in the Impala explain plan if your WHERE predicates were converted into kudu pushdown predicates (they're labelled kudu predicates).

Tim Armstrong · ‎07-28-2020

Ahh 5.11, there's been so many Impala improvements since then! This happens when the Impala daemon can't load the initial catalog (i.e. database and table metadata). The catalog and statestore roles are both involved in the catalog loading, so if the impala daemon isn't able to communicate with those roles, or those are not started or healthy then that could lead to these symptoms. You should be able to see in Cloudera Manager if they're started and if there are any warnings or errors being flagged. It might also be just that the catalog is slow to load (maybe there's a lot of metadata or something else is unhealthy). You would need to look at the logs of the impala daemon you're connecting and maybe the catalog to see what it's doing and why its slow. I know this doesn't address your immediate problem, but we've seen a lot of these metadata/catalog problem go away with later versions - CDH5.16 or CDH6+, and particularly by moving to a dedicated coordinator/executor topology - https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/impala_dedicated_coordinator.html.

Tim Armstrong · ‎07-24-2020

The row counts reflect the status of the partition or table the last time its stats were updated by "compute stats" in Impala (or analyze in Hive). Or that the stats were updated manually via an alter table. (There are also other cases where stats are updated, e.g. they can be automatically gathered by hive, but those are a few examples). One scenario where this could happen is if a partition was dropped since the last compute stats was run. The stats generally can be out of sync with the # of rows in the underlying table - we don't use them for answering queries, just for query optimization, so it's fine if they're a little inaccurate. If you want to know the accurate counts, you can run queries like select count(*) from table; select count(*) from table where business_date = "13/05/2020" and tec_execution_date = "13/05/2020 20:08;

Tim Armstrong · ‎07-21-2020

@hsri it seems like this would merit some more investigation - this was added as a nicety a little while back but it may not be working as expected. If you can reproduce this with a simple query, could you file a bug on Apache Impala? https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala

Tim Armstrong · ‎07-20-2020

I really would suggest looking at whether the particular feature you want are in CDH6.3.3. We do backport a lot of features. E.g the GPU scheduling features for YARN for Hadoop 3.1 were included in CDH 6.2 https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_620_new_features.html#hadoop_new_620. If the question is whether you can run a non-CDH version of Hadoop, and still be running CDH, then the answer is no. Or if non-CDH releases of Hadoop are supported by Cloudera - also no. We only release and support CDH versions that have been fully integrated and tested against the other CDH components. If the question is whether there is a way to take Apache Hadoop release and deploy it in a Cloudera Manager cluster, then no - it's not packaged in the right way

VidyaSargur · ‎05-14-2020

Hi @parthk , I'm happy to see you have found the resolution to your issue. Can you kindly mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future? Thanks, Vidya

omran · ‎05-11-2020

Thanks a lot TIM ..it now ok

Tim Armstrong · ‎04-27-2020

The good news is that is that we shipped date support in Impala in CDP. https://docs.cloudera.com/runtime/7.0.3/impala-sql-reference/topics/impala-date.html

Tim Armstrong · ‎04-21-2020

We did a wholesale revamp of decimal behaviour going from CDH5 to CDH6. The default behaviour all changed in CDH6.0: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.html#decimal_v2 There's a whole epic JIRA capturing the changes: https://issues.apache.org/jira/browse/IMPALA-4072 . I think https://issues.apache.org/jira/browse/IMPALA-4370 might be the specific fix that you're seeing, based on your analysis. The fix version for that change is Impala 2.9.0, so the code change is in CDH5.15.2, but it was done behind the DECIMAL_V2 query option, which wasn't a supported option until CDH6. IN CDH6 you can toggle the behaviour with the DECIMAL_V2 query option (it will eventually be removed, but was kept for backward compatibility).

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	140

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Wich sql engine best solution to use with CDP ...

Re: Kudu Partition on Timestamp column

Re: This Impala daemon is not ready to accept user...

Re: Impala show table stats - total of rows doesn'...

Re: Impala query thrift encoding missing some fiel...

Re: Hadoop 3.1.2 with CDH

Re: Does Impala support S3 select push down?

Re: Why Impala return cross join on Array and stru...

Re: Why Won't Impala Support The Date Data Type?

Re: Wrong results dividing decimal by integer