About Lars Volker

Lars Volker · ‎11-14-2017

Hi mauricio, Impala currently does not support graceful node decommissioning. We're tracking work on this feature in IMPALA-1760, but we currently are not targeting it for a particular release. Unfortunately that only leaves the option of killing the daemon. Cheers, Lars

Lars Volker · ‎08-14-2017

I just had a look, but I couldn't spot an obvious problem. The HDFS scanner fragments read around 15 MB/s, which seems reasonable to me, given how computationally intensive Parquet decoding is. There also doesn't seem to be any considerable skew. Each of your 5 nodes reads ~ 100GB of data in 134s, so the overall throughput is around 764 MB/s. I suggest to have a look at the perf improvements around Parquet files in CDH 5.12 that I mentioned in an earlier reply.

Lars Volker · ‎08-11-2017

The files shouldn't be too many. Impala processes files in parallel locally, too, so you should see a higher utilization on each node. Can you post a profile of one of the slow queries?

Lars Volker · ‎08-11-2017

I'd try to reduce the file size to 256MB and make sure that the block size is at least that large, too. That way you should end up with 32GB * 4 = 128 files per partition. That should allow you to exploit parallelism across all your nodes. You can also try 512MB per file and see if that improves things, but I suspect it won't. Btw, we're currently working on improving the ETL performance. You may want to look at the "SORT BY" clause that is included in Impala 2.9 and how it allows you to write data in a way that allows Impala to skip row groups much more effectively. You can find more information in the umbrella JIRA: https://issues.apache.org/jira/browse/IMPALA-2522

Lars Volker · ‎08-11-2017

Hi Shannon, Impala does not split up Parquet files over several readers when reading them. Instead, only one daemon will be assigned for each file and will read the whole file. Therefore it is recommended to have only one block per file. Otherwise some of the blocks can be on remote nodes and remote reads will slow down your queries. See this page for more information: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_perf_cookbook.html Cheers, Lars

Lars Volker · ‎07-12-2017

@adi91 - How did you set --mem_limit? What value did you pass to it? What did http://hostname:25000/memz?detailed=true say after applying --mem_limit to the command line options? Did your value show up there?

Lars Volker · ‎07-08-2017

After more investigation I found that this is already documented as a Known Issue in CM: Known Issues and Workarounds in Cloudera Manager 5 For Impala I opened IMPALA-5631 to explain the problem and possible solutions in the docs.

Lars Volker · ‎07-08-2017

@mbigelow - Thank you for keeping the JIRA updated - I'm glad you found the solution through support. It looks like you are hitting a bug in CM and we are working on fixing it. I will reach out to our documentation team to point out this issue in the docs and the release notes of 5.11.1. I'm sorry for the troubles this has caused you.

Lars Volker · ‎05-31-2017

num_nodes=1 forces Impala to execute the query on a single node (machine), which will then only write a single parquet file per partition.

Lars Volker · ‎04-11-2017

Hi imad87, Your question looks related to Solr, so I think it may fit better into the "Search" community: http://community.cloudera.com/t5/Cloudera-Search-Apache-SolrCloud/bd-p/Search Cheers, Lars

Online	Offline
Last Visited	‎09-24-2019 05:29 PM

Member Since	‎12-07-2015 10:33 AM
Last Visited	‎09-24-2019 05:29 PM
Posts	83
Kudos received	23

Cloudera Community

Re: When I add a new rack some Impala queries beca...

Re: Create table as select issue Unsupported type...

Re: How to gracefully stop an impalad?

Re: Need help with Impala 2.8 on CDH 5.10 upgrade

Re: Inverse of function bin

Re: How to gracefully stop an impalad?

Re: Partition, file size and number of files quest...

Re: Partition, file size and number of files quest...

Re: Partition, file size and number of files quest...

Re: Partition, file size and number of files quest...

Re: "Memory Limit Exceeded" error on Impala when i...

Re: Impala Catalogue server down after upgrading f...

Re: Impala Catalogue server down after upgrading f...

Re: combine small parquet files

Re: Receiving ZooKeeperException when trying conne...