Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error while connecting to kudu via pyspark

Highlighted

Error while connecting to kudu via pyspark

Explorer

Hi All

 

Trying to store a table in kudu as dataframe.

 Step 1:pyspark2 --packages org.apache.kudu:kudu-spark2_2.11:1.4.0

step 2:kuduDF = spark.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"xxxxxx.xxxxxxx.net").option('kudu.table',"impala::erqd.dim_address").load()

 

Getting below error

56 ERROR client.TabletClient: [Peer master-xxxxxx.xxxxxx.net:7051] Tablet server sent error Not authorized: unauthorized access to method: ConnectToMaster

 

Can you please help here

3 REPLIES 3

Re: Error while connecting to kudu via pyspark

Expert Contributor

Is your cluster Kerberos-enabled? If so, did you kinit before running the job? Try a local driver before trying a distributed driver to rule out keytab-related issues.

Re: Error while connecting to kudu via pyspark

Explorer

Yes it is ,below are steps I run

 

step 1:kinit -kt xxxxx.keytab xxxxxx@QAxxxxx.NET

 

step 2:pyspark2 --packages org.apache.kudu:kudu-spark2_2.11:1.4.0

step 3:kuduDF = spark.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"xxxxxx.xxxxxxx.net").option('kudu.table',"impala::erqd.dim_address").load()

 

I am able to create table ,query all via impala shell but unable to connect when using pyspark2 or even scala,getting below error

 

Getting below error

56 ERROR client.TabletClient: [Peer master-xxxxxx.xxxxxx.net:7051] Tablet server sent error Not authorized: unauthorized access to method: ConnectToMaster

Re: Error while connecting to kudu via pyspark

Expert Contributor

Let's try to rule out various types of problems.

 

1. Are you able to read/write to Kerberos-enabled HDFS with PySpark? Is Kudu the only Kerberos-enabled service that is not working from within PySpark?

 

2. Have you checked to ensure that the Spark driver is running on the host and shell you kinited from instead of being started in a YARN container? If it's running in YARN you have to give YARN access to the keytab to run as.

 

3. Have you tried connecting to Kudu with the regular Spark shell? Does it work? For examples see https://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark