About adar

adar · ‎10-18-2019

Is your cluster using Cloudera Manager for management? If so, have you configured a service dependency between Impala and Kudu? You'll need to go the CM configuration for Impala and reconfigure the Kudu dependency. After you do that, CM will pass the right value for -kudu_master_hosts on the Impala command line when you next restart Impala, and you'll be able to create Kudu tables without explicitly specifying kudu.master_addresses.

adar · ‎07-17-2019

Kudu is often bottlenecked by the speed at which it can flush data to disk. This usually corresponds to the number of data directories (and to maintenance_manager_num_threads). So certainly the more disks (and thus disk bandwidth) that Kudu has access to, the faster it can ingest data. If you reduce the number of partitions, you'll generally be reducing the overall ingest speed because you're reducing write parallelism. If your goal is to reduce ingest speed, then by all means explore reducing the number of partitions.

adar · ‎07-17-2019

The simplest way is to make a copy of the tables using Impala SQL statements or some Spark code. For example, with Impala you'd use a CTAS (CREATE TABLE AS SELECT) statement. See https://www.cloudera.com/documentation/enterprise/6/latest/topics/impala_create_table.html#create_table for details.

adar · ‎07-10-2019

Yeah that's the backpressure. More memory won't really help; what you need is more disk I/O so your Kudu cluster can flush the contents of the memrowset to disk faster. Is your --fs_data_dirs configured to take advantage of all of the disks on each machine? Is your --maintenance_manager_num_threads configured appropriately (we recommend 1 thread for every 3 data directories when they're backed by spinning disks).

adar · ‎07-10-2019

Isn't this inline with your expectations? You've configured the tserver to start randomly rejecting writes when it reaches 85% of the limit (rejecting more and more often as it approaches 100%), and you saw a rejection at 89%. The rejections should slow down your ingest job. Isn't this the behavior you expected?

adar · ‎07-10-2019

Besides the limit itself, there are two additional knobs you can experiment with: --memory_limit_soft_percentage (defaults to 80%) and --memory_pressure_percentage (defaults to 60%).

adar · ‎07-10-2019

As the a tserver approaches the memory limit, it'll apply backpressure on incoming writes, forcing them to slow down. This backpressure (and the limiting mechanism in general) is process-wide; there's no way to customize it for a particular tablet or memrowset. If you let your write job run for a while, the memory usage should eventually stabilize. If it doesn't, please share a heap sample as per the instructions found here.

adar · ‎05-16-2019

You can most certainly project more than one column at a time in an Impala query, be it from a table in Kudu or from HDFS. Based on your problem description, it almost sounds like a problem with your terminal, or with the impala-shell configuration. Have you looked at the impala-shell configuration options? Maybe something there can help solve the problem.

adar · ‎03-24-2019

No, the rebalancer doesn't fix leader skew. It may in a future release. Leaders can cluster onto one tserver when individual tservers are restarted; if you restart the entire cluster all at once you might be able to redistribute leadership more evenly. You're right that if you're only using one host to initiate reads, the reads will go to the local tserver rather than round-robin across the cluster. The master doesn't directly tell where clients to scan; it just provides them with enough information to make that decision based on their replica selection policy. There's also no way to do round robin (or randomized) replica selection.

adar · ‎03-24-2019

I don't believe there's a way to do that as yet. You can run the manual rebalancer in report-only mode ('kudu cluster rebalance --report_only') and see what it says. If you don't need the stronger consistency guarantees of LEADER_ONLY, change your replica selection policy to CLOSEST_REPLICA, and that should ensure a more even distribution of reads provided your scan requests are evenly originated amongst the cluster's nodes.

Online	Offline
Last Visited	‎11-13-2019 03:00 AM

Member Since	‎11-12-2013 02:25 PM
Last Visited	‎11-13-2019 03:00 AM
Posts	41
Kudos received	10

Cloudera Community

Re: Kudu Error : Table property 'kudu.master_addre...

Re: How to limit Kudu's memrowset

Re: Kudu Tablet Server High CPU

Re: Kudu tablet servers - who eats memory?

Re: How to specify memory allocation to Kudu in CM...

Re: Kudu Error : Table property 'kudu.master_addre...

Re: How to limit Kudu's memrowset

Re: kudu tables migration from one cluster to anot...

Re: How to limit Kudu's memrowset

Re: How to limit Kudu's memrowset

Re: How to limit Kudu's memrowset

Re: How to limit Kudu's memrowset

Re: Kudu Columns Null Everywhere

Re: Kudu Tablet Server High CPU

Re: Kudu Tablet Server High CPU