Member since
03-16-2017
37
Posts
6
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1738 | 09-30-2019 08:34 PM | |
3378 | 08-22-2019 01:18 PM |
05-26-2021
07:36 PM
Hi! Those warning messages about dropped RPC requests due to backpressure is a sign that particular tablet server is likely overloaded. Consider the following remedies: Upgrade to the recent version of Kudu (1.14 as of now). Since Kudu 1.9.0 there have been many fixes which might help to reduce memory pressure for write-intensive workloads (e.g. see KUDU-2727, KUDU-2929), read-only workloads (KUDU-2836), and bunch of other improvements. BTW, if you are using CDH, then upgrading to CDH6.3.4 is a good first step in that direction: CDH6.3.4 contains fixes for KUDU-2727, KUDU-2929, KUDU-2836 (those were back-ported into CDH6.3.4). Make sure the tablet replica distribution is even across tablet servers: run the 'kudu cluster rebalance' CLI tool. If you suspect replica hot-spotting, consider re-creating the table in question to fan out the write stream across multiple tablets. I guess reading this guide might be useful: https://kudu.apache.org/docs/schema_design.html If nothing from the above helps, consider adding a few more tablet server nodes into your cluster. Once new nodes are added into the cluster, don't forget to run the 'kudu cluster rebalance' CLI tool. Kind regards, Alexey
... View more
09-30-2019
08:34 PM
Hi, I think you will need Impala to make Superset working with Kudu. At http://superset.apache.org/#databases it's mentioned the database engine needs '... proper DB-API driver and SQLAlchemy dialect ...' to be usable by Superset. I guess the '...proper DB-API driver ...' is based on JDBC, and there isn't JDBC for Kudu as of now. As far as I know, there isn't native Superset Kudu connector either. However, contributions are always welcome! Kind regards, Alexey
... View more
08-22-2019
01:50 PM
Hi, Kudu requires the machine clock of master and tablet servers nodes is synchronized using NTP : https://kudu.apache.org/docs/troubleshooting.html#ntp Kudu is tested with ntpd, but I guess chronyd might work as well. Whether using ntpd or chronyd, it's necessary to make sure the machine's clock is synchronized so ntp_adjtime() Linux system call doesn't return an error (see http://man7.org/linux/man-pages/man2/adjtimex.2.html for more technical details). It's not enough just to have ntpd (or chronyd) running. It's necessary to make sure the clock is synchronized. I would verify that the NTP daemon is properly configured and tracks the clocks of the reference servers. For the instructions to check the sync status of machine's clock, see https://kudu.apache.org/docs/troubleshooting.html#ntp if using ntpd or https://docs.fedoraproject.org/en-US/Fedora/18/html/System_Administrators_Guide/sect-Checking_if_chrony_is_synchronized.html for chronyd. Hope this helps, Alexey
... View more
08-22-2019
01:20 PM
Whoops, the correct link to the WIP patch for PySpark integration work is http://gerrit.cloudera.org:8080/13088
... View more
08-22-2019
01:18 PM
Hi, I'm not sure there is a full-fledged documentation on Kudu PySpark API: the connector is still in early development phase, if I'm not mistaken. However, the following in-flight patch has a few examples that might be helpful: https://gerrit.cloudera.org/#/c/13102/2/docs/developing.adoc But it doesn't answer your question about KuduContext: I'm not sure that functionality is implemented at this point. There was a WIP patch posted some time ago: https://gerrit.cloudera.org/#/c/13086/ However, I don't know how what that status of that work at this point, unfortunately.
... View more
05-30-2019
09:29 AM
Hi, I don't know much about Kudu+PySpark except that there is a lot of room for improvement there, but maybe a couple of examples in the following patch-in-flight could be useful: https://gerrit.cloudera.org/#/c/13102/
... View more
10-17-2018
10:49 AM
Oh, sorry -- it seems you are at 5.13.0 and that flag is not available in that version yet (but it's present starting 5.14.0). I'm afraid you need either to introduce custom mappings for those kudu service principals (so they would be mapped into 'kudu') or upgrade to 5.14 or higher to get access to that flag. Setting superuser ACL to '*' would not allow tablet servers to register with masters anyway because of the following: https://github.com/apache/kudu/blob/master/src/kudu/master/master_service.cc#L122
... View more
10-17-2018
10:23 AM
Hi Christophe, It seems in your case kudu service principals (like 'kudu/XXX1119.krj.gie@REALM') are not mapped into 'kudu' as expected, but into name of local users (like 'm-zhdp-s-hwefzjneur'). If I'm not mistaken, that's exactly https://issues.apache.org/jira/browse/KUDU-2198. As a workaround, I can suggest to add --use_system_auth_to_local=false to the Kudu flags (both masters and tservers). If using CM, add that flag into the 'Kudu Service Advanced Configuration Snippet (Safety Valve) for gflagfile'. Hope this helps. Regards, Alexey
... View more
10-01-2018
11:42 AM
The 'ps' sample output from one your servers looks fine. Just another question: I assume the 'superuser_acl' property in you CM configuration (that's blurred out) contains 'kudu' (or whatever you have for the Kudu service principal), right? If not, add that into the list. Anyway, it's hard to say what's wrong looking at the configuration snippets and playing the 'guess what?' game. I would highly recommend following Will's advise on looking into the logs of master(s) and tablet servers for the error details. I think that will give you a firm starting point in troubleshooting the issue and save some time for everybody. Regards, Alexey
... View more
09-19-2018
04:22 PM
Some additional points, just in case: Using nscd (https://linux.die.net/man/8/nscd) might help with slow DNS resolutions, but if using /etc/hosts-based resolutions isn't helping (as reported in one of earlier posts), maybe it's worth verifying that those files are used (i.e. check nss.conf, etc.) Sometimes transparent hugepages support might be a drag: https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/ Maybe, try to disable it at one of your servers (e.g., try that on a machine that runs kudu-tserver and as less other processes as possible) and collect some performance metrics, comparing THP enabled/disabled. If the distribution of tablet replicas is not balanced, the tablet servers hosting greater number of replicas might experience RPC queue overflows more often than others. In the master branch of the Kudu git repo a new rebalancer tool was introduced recently: https://github.com/apache/kudu/blob/master/docs/administration.adoc#running-the-tablet-rebalancing-tool You could try to build it from source and run against your cluster with --report_only option first to see whether it makes sense to rebalance your cluster. If the rebalancer's report shows big imbalance in tablet replicas distribution, running the rebalancer tool might help. Thanks, Alexey
... View more