Member since
11-12-2013
41
Posts
11
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4450 | 10-18-2019 02:11 PM | |
4726 | 07-10-2019 03:20 PM | |
3443 | 03-24-2019 02:52 PM | |
5299 | 03-20-2019 09:01 AM | |
1700 | 12-13-2018 05:06 PM |
10-18-2019
02:11 PM
Is your cluster using Cloudera Manager for management? If so, have you configured a service dependency between Impala and Kudu? You'll need to go the CM configuration for Impala and reconfigure the Kudu dependency. After you do that, CM will pass the right value for -kudu_master_hosts on the Impala command line when you next restart Impala, and you'll be able to create Kudu tables without explicitly specifying kudu.master_addresses.
... View more
07-17-2019
09:28 AM
Kudu is often bottlenecked by the speed at which it can flush data to disk. This usually corresponds to the number of data directories (and to maintenance_manager_num_threads). So certainly the more disks (and thus disk bandwidth) that Kudu has access to, the faster it can ingest data. If you reduce the number of partitions, you'll generally be reducing the overall ingest speed because you're reducing write parallelism. If your goal is to reduce ingest speed, then by all means explore reducing the number of partitions.
... View more
07-17-2019
09:25 AM
The simplest way is to make a copy of the tables using Impala SQL statements or some Spark code. For example, with Impala you'd use a CTAS (CREATE TABLE AS SELECT) statement. See https://www.cloudera.com/documentation/enterprise/6/latest/topics/impala_create_table.html#create_table for details.
... View more
07-10-2019
05:27 PM
Yeah that's the backpressure. More memory won't really help; what you need is more disk I/O so your Kudu cluster can flush the contents of the memrowset to disk faster. Is your --fs_data_dirs configured to take advantage of all of the disks on each machine? Is your --maintenance_manager_num_threads configured appropriately (we recommend 1 thread for every 3 data directories when they're backed by spinning disks).
... View more
07-10-2019
05:07 PM
Isn't this inline with your expectations? You've configured the tserver to start randomly rejecting writes when it reaches 85% of the limit (rejecting more and more often as it approaches 100%), and you saw a rejection at 89%. The rejections should slow down your ingest job. Isn't this the behavior you expected?
... View more
07-10-2019
03:43 PM
Besides the limit itself, there are two additional knobs you can experiment with: --memory_limit_soft_percentage (defaults to 80%) and --memory_pressure_percentage (defaults to 60%).
... View more
07-10-2019
03:20 PM
As the a tserver approaches the memory limit, it'll apply backpressure on incoming writes, forcing them to slow down. This backpressure (and the limiting mechanism in general) is process-wide; there's no way to customize it for a particular tablet or memrowset. If you let your write job run for a while, the memory usage should eventually stabilize. If it doesn't, please share a heap sample as per the instructions found here.
... View more
05-16-2019
09:52 PM
You can most certainly project more than one column at a time in an Impala query, be it from a table in Kudu or from HDFS. Based on your problem description, it almost sounds like a problem with your terminal, or with the impala-shell configuration. Have you looked at the impala-shell configuration options? Maybe something there can help solve the problem.
... View more
03-24-2019
02:52 PM
No, the rebalancer doesn't fix leader skew. It may in a future release. Leaders can cluster onto one tserver when individual tservers are restarted; if you restart the entire cluster all at once you might be able to redistribute leadership more evenly. You're right that if you're only using one host to initiate reads, the reads will go to the local tserver rather than round-robin across the cluster. The master doesn't directly tell where clients to scan; it just provides them with enough information to make that decision based on their replica selection policy. There's also no way to do round robin (or randomized) replica selection.
... View more
03-24-2019
11:16 AM
I don't believe there's a way to do that as yet. You can run the manual rebalancer in report-only mode ('kudu cluster rebalance --report_only') and see what it says. If you don't need the stronger consistency guarantees of LEADER_ONLY, change your replica selection policy to CLOSEST_REPLICA, and that should ensure a more even distribution of reads provided your scan requests are evenly originated amongst the cluster's nodes.
... View more