About Tim Armstrong

Tim Armstrong · ‎03-07-2019

You'd probably do better having a conversation about this on dev@impala.apache.org, that's where a lot of this kind of discussion happens. I can give a quick answer. No you can't build Impala without modifications on aarch64, it's x86-64 only at the moment. I imagine most of the third party code works with aarch64, but haven't tried it. It would require a bit of legwork to track down all the places that assume x86-64 (intrinsics like you mentioned, but also some places in the query compilation where we assume x86-64 calling convention). The good news is that aarch64 is little-endian and has good LLVM support, which removes two major obstacles.

Tim Armstrong · ‎03-05-2019

Impala is a streaming SQL engine so query execution can actually happen at the same time as rows are returned to the client. In your case, we don't scan the whole table, put the rows somewhere, then return the rows to the client. Rather Impala just returns rows to the client at the same time as it's scanning the table. The bottleneck is likely in the client or network. Impyla is not particularly fast at parsing incoming rows and converting them into python objects. The Impala server is much much much faster. There's also a known issue that means that latency between the client and network can affect the time taken to return rows: https://issues.apache.org/jira/browse/IMPALA-1618.

Tim Armstrong · ‎03-05-2019

Impala is not designed for traditional OLTP and doesn't have transaction support that would line up with what TPC-C expects.

Tim Armstrong · ‎02-13-2019

I don't think dictionary encoding makes a different to the effectivess of min-max stats, because the data is still going to be in the file in the same order regardless.

Tim Armstrong · ‎02-12-2019

I'm not sure that parquet-cpp has any builtin way to sort data - your client code might have to do the sorting before feeding it to parquet-cpp

Tim Armstrong · ‎02-12-2019

The external tool that you are using would have to support ordering the data by those columns. E.g. if you're using hive, it supports SORT BY. If you're writing it from some custom code, that code would need to sort the data before writing it to parquet.

Tim Armstrong · ‎02-08-2019

I unfortunately don't know too many of the details of LDAP. Impala doesn't do anything sophisticated to create the directories - it just mkdir() with S_IRWXU|S_IRWXG|S_IRWXO to create the impala-scratch subdirectory and any missing parent directories.

Tim Armstrong · ‎02-08-2019

I'll assume that you have some directories configured and passed in via the --scratch_dirs argument (you can check the debug page on port 25000 or the impalad.INFO log to confirm the flag value). Then what likely happened is that there's some reason the directories weren't usable. Any errors that prevent using the directories are logged at startup.

Tim Armstrong · ‎02-07-2019

If you want to do the implicit join between the table and the nested collection, you need to reference the nested collection using the alias that you used for the table. Otherwise the top-level table and the nested collection are treated as independent table references and the query means "return the cartesian product of the tables". I.e. you want to rewrite as follows: select rta.transaction_purchase_id, rta.cigarette_transaction_flag, rta.non_cig_merch_transaction_flag, bow.item from wdl_atomic.retail_transaction_attribute rta, rta.retail_offering_material_group_distinct_list bow where rta.fiscal_period_id = 2019001; That will solve your issue.

Tim Armstrong · ‎01-10-2019

The Cloudera Manager queries page has the bytes spilled to disk as one of the metrics it tracks per query. Also in CM, there's a "Cluster utilization report" that has some aggregate information about how much data is spilled to disk over longer time windows. Also, if you're looking at the scratch files themselves the query ID is embedded in the file name (although that's an implementation detail and could change in the future).

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Has anyone tried building impala on ARM/aarch6...

Re: Impyla bad performance - rows fetch is very sl...

Re: Impala JDBC Driver - AutoCommit and TPC-C

Re: Using SORT BY with externally loaded parquet f...

Re: Using SORT BY with externally loaded parquet f...

Re: Using SORT BY with externally loaded parquet f...

Re: Could not create files in any configured scrat...

Re: Could not create files in any configured scrat...

Re: Impala performance on table with array complex...

Re: Scratch file generation information