Reply
New Contributor
Posts: 4
Registered: ‎04-21-2017

Impala-Kudu performance

Hi,

 

I want to to configure Impala to get as much performance as possible for executing analytics queries on Kudu. I may use 70-80% of my cluster resources. I looked at the advanced flags in both Kudu and Impala. Some of them didn't make sense to me and couldn't find much resources on the internet that describe them. Can any body suggest me an optimal configurations to achieve this?

I have 15 datanodes each with 16 cores, 128 GB Ram and10x1 TB hard disk. I also have to 3 separate servers for master nodes and other services ( each with16 cores and 256 GB Ram). I would appreciate any suggestions.

 

Regards,

Cloudera Employee
Posts: 212
Registered: ‎07-29-2015

Re: Impala-Kudu performance

 Hi Sattari,

  We generally try to make the default Impala configuration as good as possible to minimise tuning - there aren't really any --go_fast=true flags you can enable.

 

 

Usually the main setup decisions are about how to allocate memory between services. Impala often like lots of memory, particularly if you're running complex queries on lots of data with many joins. If it doesn't have enough memory it may end up spilling data to disk and running more slowly (or with the queries failing with "out of memory" in some cases). We have some docs about how to configure this with Cloudera Manager: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_howto_rm.html

 

The main things you can do to improve perf are to set up your data and query workloads right. There are some tips here here but a lot of them are specific to HDFS: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_perf_cookbook.html

 

 

Someone else may be able to comment in more detail about Kudu.

Explorer
Posts: 17
Registered: ‎06-13-2017

Re: Impala-Kudu performance

[ Edited ]

Impala 2.9 has several Impala-Kudu performance improvements.

 

partial list:

 

IMPALA-4859 - Push down IS NULL / IS NOT NULL to Kudu

 

IMPALA-3742 - INSERTs into Kudu tables should partition and sort

 

IMPALA-5156 - Drop VLOG level passed into Kudu client - "In some simple concurrency testing, Todd found that reducing the vlog level resulted in an increase in throughput from ~17 qps to 60qps."

 

make sure you have a large enough MEM_LIMIT and limit the number of joins in your queries. Goodluck :-)

New Contributor
Posts: 4
Registered: ‎04-21-2017

Re: Impala-Kudu performance

[ Edited ]

Thanks for answering Tim. I am not really expecting such a golden bullet flag. Can you please explain about following flags and their affects on the Impala performance?

 

kudu_mutation_buffer_size (int32)
kudu_sink_mem_required (int32)
min_buffer_size (int32)
read_size (int32)
num_disks (int32)
num_threads_per_core (int32
num_threads_per_disk (int32)
be_service_threads (int32)
exchg_node_buffer_size_bytes (int32)

New Contributor
Posts: 4
Registered: ‎04-21-2017

Re: Impala-Kudu performance

[ Edited ]

Thanks for answering vanhalen. Can you please describe more on how to pass VLOG flags from Kudu client?

Cloudera Employee
Posts: 212
Registered: ‎07-29-2015

Re: Impala-Kudu performance

Hi Sattari,

  I hope my response didn't come across as facetious. There are a lot of database products on the market that *do* ship with suboptimal configurations or require a lot of tuning. With Impala we do try to avoid that, by designing features so that they're not overly sensitive to tuning parameters and by choosing default values that give good performance.

 

My main advice for tuning Impala is just to make sure that it has enough memory to execute all of the queries in your workload in memory. And run "compute stats" on your tables to help make sure that you get good execution plans.

 

I wouldn't recommend changing any of those flags - they're mostly just safety valves for rare cases where the defaults cause unanticipated problems. The only one that directly relates to kudu is --kudu_mutation_buffer_size, which controls the amount of memory used in the kudu client for buffering inserts/updates. --kudu_sink_mem_required should be updated in sync with --kudu_mutation_buffer_size so that it's 2x.

Announcements