About Todd Lipcon

Todd Lipcon · ‎01-08-2019

In such a small cluster I'd definitely consider doubling up masters and tservers on all of the master nodes (ie 3 masters and 5 tservers). The master is pretty light weight and can be colocated with tservers for such a small workload. This way you'll get better fault tolerance and also better performance vs using 2/5 of the nodes mostly unutilized. -Todd

Todd Lipcon · ‎04-02-2018

Hi, There is not a command to do this. However, if you are using Cloudera Manager, you can navigate to the "Charts Library" page under the Kudu service, and then select "Tables" on the left hand side, and then select the table of interest. This should give various metrics including its size on disk (post-replication). Hope that helps -Todd

Todd Lipcon · ‎01-31-2018

It looks like your screenshot is of the "scans" dashboard on the web UI. This dashboard shows counters for a single scan, and a single scan would only come from a single task, not aggregate across them. I am guessing you're hitting KUDU-2231, a performance bug recently fixed. The bug fix appears in CDH 5.14.0. Since this is a performance issue that is not a regression and does not affect correctness, we have not yet backported to any prior releases. -Todd

Todd Lipcon · ‎01-30-2018

That's correct, I am not aware of a workaround for this issue.

Todd Lipcon · ‎01-30-2018

Your non-JOIN queries probably work because Impala is scheduling for locality and only scheduling work on nodes with Kudu running. When you join with HDFS data, some work is scheduled on all of the nodes in the cluster, and then those tasks running on non-Kudu nodes still need to write output to Kudu. -Todd

Todd Lipcon · ‎01-30-2018

Hi, Unfortunately the Kudu client is built in such a way that it requires SSE4.2. The CPU you are running on was discontinued in Q4 2010 and not supported by Kudu. That includes the Kudu client which is used by Impala. Unfortunately you will not be able to query Kudu tables in a mixed cluster with impala daemons that do not support SSE4.2. -Todd

Todd Lipcon · ‎08-08-2017

It looks like you're using the C++ client. Given that, you can use the KuduSession::SetTimeout() API: https://kudu.apache.org/cpp-client-api/classkudu_1_1client_1_1KuduSession.html#a25b22362650d7120f59cc1025e40bd79 -Todd

Todd Lipcon · ‎08-07-2017

Hi, If you simply increase your timeout, the client itself has built-in retries and will keep trying to complete the insert until the given time has elapsed. In a scenario that is not latency-sensitive I would recommend increasing the timeout to a minute or two. -Todd

Todd Lipcon · ‎07-27-2017

Can you give it a try changing the encoding of your primary key int column to 'PLAIN_ENCODING' instead of the default AUTO_ENCODING? I think that should resolve your problem (at the expense of some disk space)

Todd Lipcon · ‎07-26-2017

You could use 'tinker step 500' and have the effect that stepping would only be enabled for time differences more than 500ms. I wouldn't consider this breaking your production environment, but I guess you may have some reason that '-x' is important to you. We'll work on addressing this in a future release so that no system-wide changes are necessary. -Todd

Online	Offline
Last Visited	‎01-09-2019 12:15 AM

Member Since	‎09-28-2015 03:16 PM
Last Visited	‎01-09-2019 12:15 AM
Posts	65
Kudos received	9

Cloudera Community

Re: KUDU Update joining a hdfs table don't work

Re: Error when stressing the cluster

Re: Error in delete from kudu table -WARNINGS:..Ti...

Re: Small Kudu Cluster

Re: kudu table size

Re: Kudu web ui - cells read

Re: KUDU Update joining a hdfs table don't work

Re: KUDU Update joining a hdfs table don't work

Re: KUDU Update joining a hdfs table don't work

Re: Error when stressing the cluster

Re: Error when stressing the cluster

Re: Error in delete from kudu table -WARNINGS:..Ti...

Re: kudu service are getting down frequently