Member since
04-08-2014
70
Posts
20
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6888 | 07-16-2018 04:12 PM | |
7003 | 07-13-2018 03:17 PM | |
7501 | 07-10-2018 03:00 PM | |
7177 | 07-10-2018 02:54 PM | |
7756 | 07-05-2018 03:35 PM |
04-25-2019
12:43 AM
Let's try to rule out various types of problems. 1. Are you able to read/write to Kerberos-enabled HDFS with PySpark? Is Kudu the only Kerberos-enabled service that is not working from within PySpark? 2. Have you checked to ensure that the Spark driver is running on the host and shell you kinited from instead of being started in a YARN container? If it's running in YARN you have to give YARN access to the keytab to run as. 3. Have you tried connecting to Kudu with the regular Spark shell? Does it work? For examples see https://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark
... View more
04-23-2019
05:45 PM
Is your cluster Kerberos-enabled? If so, did you kinit before running the job? Try a local driver before trying a distributed driver to rule out keytab-related issues.
... View more
01-16-2019
08:55 AM
1 Kudo
Kudu runs as a separate service that Impala talks to (like HDFS runs as a separate service from Impala) so you have to have Kudu running somewhere for it to work. However you don't have to run Kudu on the same servers that you run Impala on -- remote reads are supported over the network.
... View more
01-16-2019
08:46 AM
EricL is correct, you don't need to worry about files with Kudu in the same way that you have to worry about them with typical Hive tables. Kudu stores its data directly on ext4 in a distributed way and does not use HDFS. You can take a look at where Kudu is storing its data on the local file system if you go into Cloudera manager and take a look at how the --fs-data-dirs and --fs-wal-dir configuration options are set up across the various Tablet Server nodes. Hope that helps, Mike
... View more
11-27-2018
01:02 PM
1 Kudo
A WAL file is a Kudu tablet write-ahead log file. You can read an overview of how the Kudu write path works here (it's a fairly techincal blog post): https://blog.cloudera.com/blog/2017/04/apache-kudu-read-write-paths/ The WAL file location is controlled by the configuration parameter --fs_wal_dir which you can read about at https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_fs_wal_dir
... View more
09-28-2018
11:36 AM
FYI, I think my reply from 9/21 was wrong. As far as I can tell, the rules work as follows: 1. If the Kudu table is managed by Impala, it's not possible to change the kudu.table_name and kudu.master_addresses properties. This is the case when it's not an EXTERNAL table. See https://issues.apache.org/jira/browse/IMPALA-5654 for more information on that. I have filed an improvement request to track automatically renaming the Kudu table when the Impala table is renamed to keep them in sync, but right now it's not possible. See https://issues.apache.org/jira/browse/IMPALA-7640 for more information. 2. If you have an EXTERNAL table (Kudu table not managed by Impala) then you are able to alter the kudu.table_name table property. The above was tested on a non-secure cluster, and I would be interested to hear if others' experiences are the same as mine were even on a secured cluster. However I believe the behavior is the same in both cases. Hope this helps, Mike
... View more
09-25-2018
04:08 PM
Just following up here, I just tested this on Impala version 2.13 (dev version) and I cannot reproduce the ability to alter table set tblproperties to rename the Kudu table name even after altering the Impala table name. Is anyone else able to reproduce this? I get the following error: > alter table mpercy_k2 set tblproperties('kudu.table_name'='impala::default.mpercy_k2'); Query: alter table mpercy_k2 set tblproperties('kudu.table_name'='impala::default.mpercy_k2') ERROR: AnalysisException: Not allowed to set 'kudu.table_name' manually for managed Kudu tables . However this is by design from what I have discussed with some others. I think the "bug" is that Impala alter table doesn't automatically rename the Kudu table internally. However it would be a security problem to be able to alter the kudu table name with tblproperties because Sentry applies the security rules to the Impala table name.
... View more
09-21-2018
06:35 PM
2 Kudos
@Ankit_Mishra's answer is the correct way to do the procedure you want to do, Impala doesn't allow for separately managing the Kudu and Impala tables if you create the Kudu table through Impala.
... View more
09-21-2018
06:16 PM
@Andreyeff Another thing you can try doing is increasing the raft heartbeat interval from 500ms to 1500ms or even 3000ms, see https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_raft_heartbeat_interval_ms This will affect your recovery time by a few seconds if a leader fails since by default, elections don't happen for 3 missed heartbeat periods (controlled by https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_leader_failure_max_missed_heartbeat_periods )
... View more
09-18-2018
09:47 AM
Frankly it sounds like you should revisit your capacity planning. You can try bumping up raft consensus timeouts and upgrading to the latest version of Kudu but it may not help that much.
... View more