Member since
03-16-2017
37
Posts
6
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1094 | 09-30-2019 08:34 PM | |
1782 | 08-22-2019 01:18 PM |
05-26-2021
07:36 PM
Hi! Those warning messages about dropped RPC requests due to backpressure is a sign that particular tablet server is likely overloaded. Consider the following remedies: Upgrade to the recent version of Kudu (1.14 as of now). Since Kudu 1.9.0 there have been many fixes which might help to reduce memory pressure for write-intensive workloads (e.g. see KUDU-2727, KUDU-2929 ), read-only workloads (KUDU-2836), and bunch of other improvements . BTW, if you are using CDH, then upgrading to CDH6.3.4 is a good first step in that direction: CDH6.3.4 contains fixes for KUDU-2727, KUDU-2929, KUDU-2836 (those were back-ported into CDH6.3.4). Make sure the tablet replica distribution is even across tablet servers: run the 'kudu cluster rebalance' CLI tool. If you suspect replica hot-spotting, consider re-creating the table in question to fan out the write stream across multiple tablets. I guess reading this guide might be useful: https://kudu.apache.org/docs/schema_design.html If nothing from the above helps, consider adding a few more tablet server nodes into your cluster. Once new nodes are added into the cluster, don't forget to run the 'kudu cluster rebalance' CLI tool. Kind regards, Alexey
... View more
09-30-2019
08:56 PM
1 Kudo
Hi, To get information on table's replication factor, it's possible to use the following sequence of API calls: Java API: KuduClient.openTable() ; KuduTable.getNumReplicas() C++ API: KuduClient::OpenTable() ; KuduTable::num_replicas() From the command line, it's possible to use the kudu CLI tool: kudu table describe <master_rpc_endpoints> <table_name> In the output of the CLI command above, search for 'REPLICAS' (usually the last line). HTH, Alexey
... View more
09-30-2019
08:34 PM
Hi, I think you will need Impala to make Superset working with Kudu. At http://superset.apache.org/#databases it's mentioned the database engine needs '... proper DB-API driver and SQLAlchemy dialect ...' to be usable by Superset. I guess the '...proper DB-API driver ...' is based on JDBC, and there isn't JDBC for Kudu as of now. As far as I know, there isn't native Superset Kudu connector either. However, contributions are always welcome! Kind regards, Alexey
... View more
09-30-2019
02:06 PM
Hi, As of Kudu v1.10 it's not possible to restrict access on a per-partition (i.e. per-tablet) basis. Thanks, Alexey
... View more
08-23-2019
09:43 PM
Ah, there is another thread which might be helpful: https://community.cloudera.com/t5/Support-Questions/Kudu-to-HDFS-data-load-timestamp-issue/td-p/93646
... View more
- Tags:
- apache-kudu
- HDFS
08-23-2019
11:25 AM
Hi, Thank you for reporting the issue! With CDH6.1.0, kudu-spark2_2.11-1.8.0-cdh6.1.0.jar is available: https://archive.cloudera.com/cdh6/6.1.0/maven-repository/org/apache/kudu/kudu-spark2_2.11/1.8.0-cdh6.1.0/ However, applications can use kudu-spark2_2.11-1.7.0 with Kudu server side of CDH6.1.0 (i.e. the older version of kudu_spark2_11 is 'supported' at least in this sense). Yes, you are right: in the Apache Kudu git repo, the UPSERT ignoreNull option is available Kudu 1.8.0 and onward. For CDH, the UPSERT ignoreNull option is available starting kudu-spark2_2.11-1.8.0, it's not available in older versions (i.e. kudu-spark2_2.11-1.7.0 doesn't have it). I'll try to reach out to see whether the inconsistency you pointed can be fixed in CDH6.1.0 online documentation. Thanks, Alexey
... View more
- Tags:
- apache-kudu
- cdh
08-22-2019
02:26 PM
Hi, Thank you for reporting the issue. I guess the issue might be related to the interpretation of the timezone in timestamps read from HDFS table. Maybe, reading through this https://www.cloudera.com/documentation/enterprise/5-16-x/topics/impala_timestamp.html might give you some ideas w.r.t. how Impala interprets timestamps. To be able to help, I need to get more context on this. Could you share some more details on how the data from HDFS is piped into Kudu? For example: snippets of the code used, etc. Thanks, Alexey
... View more
08-22-2019
01:50 PM
Hi, Kudu requires the machine clock of master and tablet servers nodes is synchronized using NTP : https://kudu.apache.org/docs/troubleshooting.html#ntp Kudu is tested with ntpd, but I guess chronyd might work as well. Whether using ntpd or chronyd, it's necessary to make sure the machine's clock is synchronized so ntp_adjtime() Linux system call doesn't return an error (see http://man7.org/linux/man-pages/man2/adjtimex.2.html for more technical details). It's not enough just to have ntpd (or chronyd) running. It's necessary to make sure the clock is synchronized. I would verify that the NTP daemon is properly configured and tracks the clocks of the reference servers. For the instructions to check the sync status of machine's clock, see https://kudu.apache.org/docs/troubleshooting.html#ntp if using ntpd or https://docs.fedoraproject.org/en-US/Fedora/18/html/System_Administrators_Guide/sect-Checking_if_chrony_is_synchronized.html for chronyd. Hope this helps, Alexey
... View more
08-22-2019
01:20 PM
Whoops, the correct link to the WIP patch for PySpark integration work is http://gerrit.cloudera.org:8080/13088
... View more
08-22-2019
01:18 PM
Hi, I'm not sure there is a full-fledged documentation on Kudu PySpark API: the connector is still in early development phase, if I'm not mistaken. However, the following in-flight patch has a few examples that might be helpful: https://gerrit.cloudera.org/#/c/13102/2/docs/developing.adoc But it doesn't answer your question about KuduContext: I'm not sure that functionality is implemented at this point. There was a WIP patch posted some time ago: https://gerrit.cloudera.org/#/c/13086/ However, I don't know how what that status of that work at this point, unfortunately.
... View more
05-30-2019
09:58 AM
Hi, As far as I know, kudu cluster ksck <kudu_master_rpc_endpoint> follows the standard convention and exits with non-zero status if it detects a problem with the cluster or detected other run-time issue. You could rely on the exit code of the kudu CLI tool to understand whether it detected any problem or not. In any case, the tool outputs information both into stdout and stderr in case of an error, so maybe make sure you capture both streams. However, the part after 'Errors:' is output into stdout. When running the tool, make sure it runs under credentials of the 'kudu' user (e.g., sudo -u kudu kudu cluster ksck ...) or whatever OS user kudu processes are run with: that's to pass authz checks. I'm not sure what you mean by 'second command': the only command what is run is the kudu CLI tool itself when you execute 'kudu cluster ksck ...'. Maybe, try first to experiment with the tool in a plain interactive shell environement and see what you get. As of now with kudu CLI binary from 1.9.0 release, I'm able to capture all the output of 'kudu cluster ksck' with no issues when running it in a standard bash session. However, you might be hitting this bug: https://issues.apache.org/jira/browse/KUDU-2819 The bug rarely manifested itself and has been found and fixed just recently. So, make sure the kudu CLI tool doesn't crash when you run it against your cluster. If it does, get or build a binary with the fix for KUDU-2819 included and use the one. HTH, Alexey
... View more
05-30-2019
09:29 AM
Hi, I don't know much about Kudu+PySpark except that there is a lot of room for improvement there, but maybe a couple of examples in the following patch-in-flight could be useful: https://gerrit.cloudera.org/#/c/13102/
... View more
11-16-2018
11:47 AM
Hi, Thank you for the report. Could you be more specific and provide some information on the version of Kudu (or CDH) you are using? Also, how many replicas per tablet server and how much data per tablet server does the cluster have? In Kudu 1.6 and prior versions the re-replication process might take very long time on some scenarios involving a restart of a tablet server. That has improved dramatically since Kudu 1.7 once more robust replica management scheme was introduced. Regards, Alexey
... View more
10-17-2018
10:49 AM
Oh, sorry -- it seems you are at 5.13.0 and that flag is not available in that version yet (but it's present starting 5.14.0). I'm afraid you need either to introduce custom mappings for those kudu service principals (so they would be mapped into 'kudu') or upgrade to 5.14 or higher to get access to that flag. Setting superuser ACL to '*' would not allow tablet servers to register with masters anyway because of the following: https://github.com/apache/kudu/blob/master/src/kudu/master/master_service.cc#L122
... View more
10-17-2018
10:23 AM
Hi Christophe, It seems in your case kudu service principals (like 'kudu/XXX1119.krj.gie@REALM') are not mapped into 'kudu' as expected, but into name of local users (like 'm-zhdp-s-hwefzjneur'). If I'm not mistaken, that's exactly https://issues.apache.org/jira/browse/KUDU-2198. As a workaround, I can suggest to add -- use_system_auth_to_local=false to the Kudu flags (both masters and tservers). If using CM, add that flag into the 'Kudu Service Advanced Configuration Snippet (Safety Valve) for gflagfile'. Hope this helps. Regards, Alexey
... View more
10-01-2018
11:42 AM
The 'ps' sample output from one your servers looks fine. Just another question: I assume the 'superuser_acl' property in you CM configuration (that's blurred out) contains 'kudu' (or whatever you have for the Kudu service principal), right? If not, add that into the list. Anyway, it's hard to say what's wrong looking at the configuration snippets and playing the 'guess what?' game. I would highly recommend following Will's advise on looking into the logs of master(s) and tablet servers for the error details. I think that will give you a firm starting point in troubleshooting the issue and save some time for everybody. Regards, Alexey
... View more
09-19-2018
04:22 PM
Some additional points, just in case: Using nscd (https://linux.die.net/man/8/nscd) might help with slow DNS resolutions, but if using /etc/hosts-based resolutions isn't helping (as reported in one of earlier posts), maybe it's worth verifying that those files are used (i.e. check nss.conf, etc.) Sometimes transparent hugepages support might be a drag: https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/ Maybe, try to disable it at one of your servers (e.g., try that on a machine that runs kudu-tserver and as less other processes as possible) and collect some performance metrics, comparing THP enabled/disabled. If the distribution of tablet replicas is not balanced, the tablet servers hosting greater number of replicas might experience RPC queue overflows more often than others. In the master branch of the Kudu git repo a new rebalancer tool was introduced recently: https://github.com/apache/kudu/blob/master/docs/administration.adoc#running-the-tablet-rebalancing-tool You could try to build it from source and run against your cluster with --report_only option first to see whether it makes sense to rebalance your cluster. If the rebalancer's report shows big imbalance in tablet replicas distribution, running the rebalancer tool might help. Thanks, Alexey
... View more
09-19-2018
01:18 PM
Right, to make the fix into Impala it's necessary to relink impalad with patched Kudu client. impalad is linked against the kudu_client dynamically, so in theory it might be possible just to replace the libkudu_client.so.0 library with the patched version. However, that's really messy and I would not recommend that. If you use CDH anyway, the best option is to wait for next release of CDH. I don't know what version that will be, though. If you want a workaround, set the -- authn_token_validity_seconds flag to be months or even one year long (i.e. -- authn_token_validity_seconds= 31536000) and restart Kudu masters. You will need to enable experimental flags as well (i.e. add --unlock_experimental_flags).
... View more
09-17-2018
11:54 AM
Hi, Can you check what keytab your tablet servers are running with? You can do that by logging in to one of the tablet server machines and checking the command line that kudu-tserver process is running with. Then check what's inside that keytab. It's something like [root@anonymous ~]# ps axw | grep kudu-tserver 548 pts/0 S+ 0:00 grep --color=auto kudu-tserver 32747 ? Sl 1:12 /opt/cloudera/parcels/CDH/lib/kudu/sbin/kudu-tserver --rpc_authentication=required --rpc_encryption=required --keytab_file=/var/run/ c loudera-scm-agent/process/580-kudu-KUDU_TSERVER/kudu.keytab --tserver_master_addrs=master.myhost.org --flagfile=/var/run/cloudera-scm-agent/process/580-kudu-KUDU_TSERVER/gflagfile [root@anonymous ~]# klist -k /var/run/cloudera-scm-agent/process/580-kudu-KUDU_TSERVER/kudu.keytab Keytab name: FILE:/var/run/cloudera-scm-agent/process/580-kudu-KUDU_TSERVER/kudu.keytab KVNO Principal ---- -------------------------------------------------------------------------- 2 kudu/ts-01.myhost.org@DC.MYHOST.ORG 2 kudu/ts-01.myhost.org@DC.MYHOST.ORG 2 kudu/ts-01.myhost.org@DC.MYHOST.ORG 2 kudu/ts-01.myhost.org@DC.MYHOST.ORG If tablet servers are not runing or running without keytabs, or there is nothing in those keytabs, that might be the problem. Anyway, I think there should be log files of Kudu tablet servers at those machines, by default they are in /var/log/kudu. Checking those logs might give you some ideas what to start the troubleshooting with. Regards, Alexey
... View more
09-17-2018
11:14 AM
1 Kudo
Hi, I have an update on this issue. I've got a chance to look at this issue closer having on hand Impalad's log. As it turned out, there is a bug in the Kudu C++ client and I think your case was a manifestation of that bug as well: https://issues.apache.org/jira/browse/KUDU-2580 I posted a patch to fix the issue and I hope the fix will be reviewed and merged soon. Best regards, Alexey
... View more
08-04-2018
09:58 AM
Thank you for the update. As I understand the logs doesn't contain messages on failure to re-acquire authn token, and authn tokens were successfully re-acquired some time before the error referenced in the first post. Two more questions: How far the timestamp referenced in the very fist post was from the time when Impala was unable to launch a query? I0503 09:29:01.528237 42404 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <Kudu_leader_IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired What was the error message output to the user (if any), when Impala was unable to launch the query? I might be wrong, but those messages about expired authn tokens might be a red herring. Thank you!
... View more
08-01-2018
12:26 PM
Hi, Thank you for reporting an issue. A couple of questions to clarify before starting troubleshooting the issue: How many Kudu masters do you have in your cluster? Do Kudu tables in Impala have all Kudu masters specified in their 'kudu.master_addresses' property? In the Impala's logs, do you see anything like ' Reconnecting to the cluster for a new authn token'? If yes, what are the log lines right after that? In the Impala's logs, do you see anything like ' Unable to reconnect to the cluster for a new authn token'? If yes, what are the error details that come along with that message?
... View more
07-30-2018
03:18 PM
Hi, Probably, the problem with those dates being back in 1970 comes from the fact that the value stored in the Kudu UNIXTIME_MICROS column is interpreted as number of microseconds from the start of the Epoch? Maybe, you can take a look into the Kudu Java client code to get some examples on working with timestamp columns: https://github.com/apache/kudu/blob/master/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java#L369 https://github.com/apache/kudu/blob/master/java/kudu-client/src/test/java/org/apache/kudu/client/TestRowResult.java#L49 java.sql.Timestamp class might be helpful in various date/time conversions. Hope that helps.
... View more
04-10-2018
07:55 PM
Just for better visibility I wanted to refer to corresponding thread at user@kudu.apache.org: https://lists.apache.org/thread.html/99bb508aaf2a33d823066ddeb7ba8b58e14eb577c5a89f56359f5cd4@%3Cuser.kudu.apache.org%3E Also, there were responses at #kudu-general Slack channel at https://getkudu.slack.com/ https://getkudu.slack.com/archives/C0CPXJ3CH/p1523339652000175
... View more
03-05-2018
01:20 PM
1 Kudo
Ah, just noticed one strange thing in the Case1: the port number for the Kudu master specified is 7077. Is that correct? If you didn't tweak Kudu's configuration, then Kudu masters are listening for RPC at port 7051 by default. Also, comparing Case1 with Case2, it seems Case2 is using 'localhost' for the address of the master, while in the Case1 IP address 172.17.0.43 is used. So, for the Case2, if you are running that shell not from the machine where your master is running (I assume that's a machine with 172.17.0.43 IP address configured at one of its interfaces), then you need to specify not 'localhost', but the name/IP of the machine where your Kudu master is running. And of cource, in case of multi-master Kudu cluster, you need to have endpoints of all Kudu masters for the "kudu.master" property. Hope this helps, Alexey
... View more
03-05-2018
12:54 PM
Hi, How many masters do you have in your cluster? I looked at the error from Case2 a bit and it might happen that the client does not have proper list of Kudu master endpoints (if it's indeed the multi-master case). If you have just one master in your Kudu cluster, then it makes sense to look into the master's log on that single machine -- it might give you some hints why that single master could not declare itself a leader. Thanks, Alexey
... View more
11-14-2017
10:48 AM
3 Kudos
Hi, You can have multiple replicas of data stored in Kudu tables -- Kudu allows you to configure per-table replication factor when creating a table. Replication factors of 3, 5, and 7 are available out of the box; for higher if you need to tweak the --max_num_replicas mater's flag. Under the hood, every tablet (part of the table which corresponds to a partition) is a Raft cluster, where every transaction is considered committed only when it's replicated and acknowledged back to the leader replica by the majority of replicas in the tablet. Replicas of one tablet are distributed among different tablet servers (it's not possible to run multiple replicas of one tablet at the same tablet server). Unless the replication factor is set to 1 (i.e. no replication at all) or all tablet servers are run on the same machine (which is a bad idea), then for every tablet there should be at least one replica having the copy of the data once a disk on one server fails. You can get more details at https://kudu.apache.org/overview.html#distribution-and-fault-tolerance and https://github.com/apache/kudu/blob/master/docs/design-docs/consensus.md I hope this helps.
... View more
11-07-2017
10:51 AM
Hi, Impala uses Hive metastore validators, and those do not allow a table to have a dot (i.e. '.' symbol) it name of a table. Usually, the prefix before the dot stands for a database name. Kudu allows table names to have a dot in the name, and that's because it does not support a concept of a database yet. That may change in future, though. As a workaround, don't add dots into the names of your Kudu tables.
... View more
09-20-2017
03:21 PM
The snippet posted shows that the tablet server is unable to verify the TLS certificate generated for the tablet server because the certificate 'valid from' field is in the future. That's most likely because the master host's clock is at least 1 second ahead of the tablet server host's clock. Tablet server TLS certificates are generated by master when the tablet server connects to the master first time after starting up. Tablet server will retry the connection with next heartbeat to the master, sending a new certificate signing request, and the master will generate a new certificate, with the validity date in the future, again. I suspect that the error will continue to appear even if you restart Kudu, and restarting Kudu will not help. You need to synchronize clock across machines in the cluster, at least within the delta of 1 second. If NTP does not work for you, I would recommend trying at least to run 'ntpdate' at every machine of your cluster prior to starting Kudu servers.
... View more
05-31-2017
01:47 PM
Cloudera publishes Kudu binary packages in accordance with the internal release cycle. Per company's policy, we don't promise any particular dates for features/builds availability. However, there are binary packages of Kudu 1.3.0 in the Cloudera's repository. E ven though our binary says 1.3.0 it may include backports that were included in the upstream 1.3.1. When it comes to bugfix patches, we often include backports that aren't in the exact upstream release. You can find package repos URLs at the https://www.cloudera.com/documentation/kudu/latest/topics/kudu_installation.html page. It might happen that the patch you are looking for is in cloudera's package v1.3.0 already. For details, please take a look at the release notes: https://www.cloudera.com/documentation/kudu/latest/topics/release_notes.html#relnotes_1_3_0 Is there any specific feature/patch that you are expecting with 1.3.1 release?
... View more