About mpercy

mpercy · ‎04-25-2019

Let's try to rule out various types of problems. 1. Are you able to read/write to Kerberos-enabled HDFS with PySpark? Is Kudu the only Kerberos-enabled service that is not working from within PySpark? 2. Have you checked to ensure that the Spark driver is running on the host and shell you kinited from instead of being started in a YARN container? If it's running in YARN you have to give YARN access to the keytab to run as. 3. Have you tried connecting to Kudu with the regular Spark shell? Does it work? For examples see https://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark

mpercy · ‎04-23-2019

Is your cluster Kerberos-enabled? If so, did you kinit before running the job? Try a local driver before trying a distributed driver to rule out keytab-related issues.

mpercy · ‎01-16-2019

Kudu runs as a separate service that Impala talks to (like HDFS runs as a separate service from Impala) so you have to have Kudu running somewhere for it to work. However you don't have to run Kudu on the same servers that you run Impala on -- remote reads are supported over the network.

mpercy · ‎01-16-2019

EricL is correct, you don't need to worry about files with Kudu in the same way that you have to worry about them with typical Hive tables. Kudu stores its data directly on ext4 in a distributed way and does not use HDFS. You can take a look at where Kudu is storing its data on the local file system if you go into Cloudera manager and take a look at how the --fs-data-dirs and --fs-wal-dir configuration options are set up across the various Tablet Server nodes. Hope that helps, Mike

mpercy · ‎11-27-2018

A WAL file is a Kudu tablet write-ahead log file. You can read an overview of how the Kudu write path works here (it's a fairly techincal blog post): https://blog.cloudera.com/blog/2017/04/apache-kudu-read-write-paths/ The WAL file location is controlled by the configuration parameter --fs_wal_dir which you can read about at https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_fs_wal_dir

mpercy · ‎09-28-2018

FYI, I think my reply from 9/21 was wrong. As far as I can tell, the rules work as follows: 1. If the Kudu table is managed by Impala, it's not possible to change the kudu.table_name and kudu.master_addresses properties. This is the case when it's not an EXTERNAL table. See https://issues.apache.org/jira/browse/IMPALA-5654 for more information on that. I have filed an improvement request to track automatically renaming the Kudu table when the Impala table is renamed to keep them in sync, but right now it's not possible. See https://issues.apache.org/jira/browse/IMPALA-7640 for more information. 2. If you have an EXTERNAL table (Kudu table not managed by Impala) then you are able to alter the kudu.table_name table property. The above was tested on a non-secure cluster, and I would be interested to hear if others' experiences are the same as mine were even on a secured cluster. However I believe the behavior is the same in both cases. Hope this helps, Mike

mpercy · ‎09-25-2018

Just following up here, I just tested this on Impala version 2.13 (dev version) and I cannot reproduce the ability to alter table set tblproperties to rename the Kudu table name even after altering the Impala table name. Is anyone else able to reproduce this? I get the following error: > alter table mpercy_k2 set tblproperties('kudu.table_name'='impala::default.mpercy_k2'); Query: alter table mpercy_k2 set tblproperties('kudu.table_name'='impala::default.mpercy_k2') ERROR: AnalysisException: Not allowed to set 'kudu.table_name' manually for managed Kudu tables . However this is by design from what I have discussed with some others. I think the "bug" is that Impala alter table doesn't automatically rename the Kudu table internally. However it would be a security problem to be able to alter the kudu table name with tblproperties because Sentry applies the security rules to the Impala table name.

mpercy · ‎09-21-2018

@Ankit_Mishra's answer is the correct way to do the procedure you want to do, Impala doesn't allow for separately managing the Kudu and Impala tables if you create the Kudu table through Impala.

mpercy · ‎09-21-2018

@Andreyeff Another thing you can try doing is increasing the raft heartbeat interval from 500ms to 1500ms or even 3000ms, see https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_raft_heartbeat_interval_ms This will affect your recovery time by a few seconds if a leader fails since by default, elections don't happen for 3 missed heartbeat periods (controlled by https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_leader_failure_max_missed_heartbeat_periods )

mpercy · ‎09-18-2018

Frankly it sounds like you should revisit your capacity planning. You can try bumping up raft consensus timeouts and upgrading to the latest version of Kudu but it may not help that much.

Online	Offline
Last Visited	‎05-09-2019 02:10 AM

Member Since	‎04-08-2014 11:43 PM
Last Visited	‎05-09-2019 02:10 AM
Posts	70
Kudos received	19

Cloudera Community

Re: Apache kudu

Re: Apache kudu

Re: Apache kudu

Re: Apache kudu

Re: Apache kudu

Re: Error while connecting to kudu via pyspark

Re: Error while connecting to kudu via pyspark

Re: Where's KUDU stores own tables?

Re: Where's KUDU stores own tables?

Re: Data replication in Kudu

Re: How to rename kudu table name on impala versio...

Re: How to rename kudu table name on impala versio...

Re: How to rename kudu table name on impala versio...

Re: Kudu backpressure and service queue is full

Re: Kudu backpressure and service queue is full