About Tim Armstrong

RNN · ‎10-23-2019

Thanks for your reply, but it didn't work. I tried using other functions but it brings up the date time without milliseconds. But I want the milliseconds as well.

Tim Armstrong · ‎09-20-2019

@Zane- I'm late but can provide some additional insight. I think the suggestion in the error message is a good one (I'm biased because I wrote it, but some thought went into it). "Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error". The general solution for this is to set up admission control with some memory limits so that memory doesn't get oversubscribed, and so that one query can't gobble up more memory than you like. I did a talk at strata that gave pointers on a lot of this things - https://conferences.oreilly.com/strata/strata-ca-2019/public/schedule/detail/73000 In this case you can actually see that query 2f4b5cff11212907:886aa1400000000 is using Total=78.60 GB memory, so that's likely your problem. Impala's resource management is totally permissive out of the box and will happily let queries use up all the resources in the system like this. I didn't see what version you're running, but there were a lot of improvements in this area (config options, OOM-avoidance, diagnostics) in CDH6.1+ There's various other angles you can take to improve this - if the queries using lots of memory are suboptimal, tuning them (maybe just computing stats) makes a big difference. You can also

Tim Armstrong · ‎07-29-2019

I filed https://issues.apache.org/jira/browse/IMPALA-8807 to fix the docs.

sgrip · ‎07-15-2019

Probably, the version we are using is 5.12.

Paulo · ‎06-18-2019

Thanks a lot, Tim!

honghan · ‎06-14-2019

Thanks for your quick reply.

Tim Armstrong · ‎06-13-2019

Yeah I agree there is some inconsistency in behaviour here - the casting rules, especially around NULL, are too complex and inconsistent.

Tim Armstrong · ‎04-18-2019

If you are mainly accessing the table using Impala, I'd recommend Impala's compute stats for best performance of Impala. There are some subtle differences in the stats collected (whether they're partition or table-level). The engines can interoperate but Impala can generally generate better plans with the full set of stats from "COMPUTE STATS"

PranayMunshi · ‎04-17-2019

Thank you very much Tim. Provided link has clarified my doubt.

Tim Armstrong · ‎04-17-2019

In it's default configuration, metadata is cached until an "INVALIDATE METADATA" command evicts the table from the cache. Or until the catalog is restarted. In 5.16 and 6.1+ there are some non-default options that will evict metadata after a particular timeout. At some point these will become the defaults. Table stats are collected and stored in the hive metastore when you run a "compute stats" command. They are then just part of the table metadata.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Impala - Convert String to Timestamp

Re: ExecQueryFInstances rpc query_id=e74ef8d9b9215...

Re: Avoiding hdfs scan when querying only partitio...

Re: IMPALA: RIGHT OUTER JOIN type with no equi-jo...

Re: Cancel or close queued queries using Impala JD...

Re: CentOS6.4 Gplextras 5.16.1 libimpalalzo.so: ...

Re: Inconsistency by 1 * (NULL), and causes Analys...

Re: COMPUTE Stats or Analyze table

Re: Does Impala uses fair scheduler? and YARN for ...

Re: Does IMPALA cached the query statistics?