About alex.behm

jsohn · ‎04-03-2017

We are facing the same issue with not being able to calculate table stats ( we run several large tables), is upgrading to impala 2.8 only fix? we are running cloudera 5.9, will upgrading to impala 2.8 cause any issues? Would think calculating table stats on large table is a common workflow for most clients. Is it possible to get a patch of this on Impala 2.7 ? Thanks

thewayofthinkin · ‎03-21-2017

Alex, Thank you again. Subquery approach has been recommended to our team as a long term solution. However, for short-tem solution to avoid regression impact, using view with limited partitions has been selected. If I remember correctly, in MySQL `table A` data can be limited by `ON Clause` before joining so that cadidates for join can be reduced. Thank you for your valuable comment. Gatsby

alex.behm · ‎02-23-2017

Thomas, you have a legitimate request and concern. First, there is no perfectly fool-proof solution because the resource consumption is somewhat dependent on what happens at runtime, and not all memory consumption is tracked by Impala (but must is). We are constantly making improvements in this area though. 1. I'd recommend fixing the num_scanner_threads for your queries. A different number of scanner threads can result in different memory consumption from run to run (and dependent on what else is going on in the system at the time). 2. The operators of a query do not run one-by-one. Some of them run concurrently (e.g. join builds may execute concurrently). So just looking at the highest peak in the exec summary is not enough. Taking the sum of the peaks over all operators is a safer bet, but tends to overestimate the actual consumption. Hope this helps!

Lars Volker · ‎02-22-2017

I just saw this thread after commenting on the Jira. Would "conv()" be a suitable workaround here? select conv('100010', 2, 10); +-----------------------+ | conv('100010', 2, 10) | +-----------------------+ | 34 | +-----------------------+ Fetched 1 row(s) in 0.24s More information on conv() can be found in the Impala documentation. Edit: To make things complete, the Jira is IMPALA-4968.

alex.behm · ‎02-02-2017

@gaurang would you be open to sharing your CREATE TABLEs, CREATE VIEW and the query that has slow planning time? No need for the data, just that should be sufficient for us to understand better what's going on. Like Lars said, you are probably hitting IMPALA-4242 which explains the slow equivalence class computation, but I'd also like to understand the slow single-node planning time. Thanks!

thewayofthinkin · ‎01-31-2017

FYI, `COMPUTE STATS` can run with first level partition. https://issues.cloudera.org/browse/IMPALA-1570

Chewlocka · ‎01-26-2017

Thanks again, and please be aware the incorrect text is also found here: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_perf_hdfs_caching.html "When data is requested to be pinned in memory, that process happens in the background without blocking access to the data while the caching is in progress. Loading the data from disk could take some time. Impala reads each HDFS data block from memory if it has been pinned already, or from disk if it has not been pinned yet. When files are added to a table or partition whose contents are cached, Impala automatically detects those changes and performs a REFRESH automatically once the relevant data is cached."

alex.behm · ‎01-20-2017

Thanks!

ZachRoes · ‎01-17-2017

Yes. Use a spark-hbase-connector.

ZachRoes · ‎01-17-2017

In Impala, a table can be created by using the ‘CREATE Table’ command. Let us understand the general syntax of creating a table in Impala with the help of the commands shown on the screen. The ‘PARTITIONED BY’ clause partitions data files based on one or more specified columns values.

Online	Offline
Last Visited	‎05-10-2018 06:52 PM

Member Since	‎10-16-2013 11:04 AM
Last Visited	‎05-10-2018 06:52 PM
Posts	307
Kudos received	77

Cloudera Community

Re: External Table from Parquet folder returns emp...

Re: Impala SQL for KUDU does not work

Re: Impalad logs diskspace full

Re: Impala round function does not return expected...

Re: Is Impala a proces engine when I use kudu?

Re: Incremental stats size estimate exceeds 200.00...

Re: Impala runtime filter not working as expected

Re: How to set MEM_LIMIT based on explain plan

Re: Inverse of function bin

Re: When querying a VIEW, query planning takes a l...

Re: WARNINGS: Too many partitions selected, doing ...

Re: Impala doesn't detect file changes automatical...

Re: CREATE TABLE AS SELECT returns error 'Failed t...

Re: can we match HBase partitions with Impala Part...

Re: Partitioning in Impala