- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Error while connecting to kudu via pyspark
- Labels:
-
Apache Impala
-
Apache Kudu
-
Kerberos
Created on 04-23-2019 07:13 AM - edited 09-16-2022 07:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All
Trying to store a table in kudu as dataframe.
Step 1:pyspark2 --packages org.apache.kudu:kudu-spark2_2.11:1.4.0
step 2:kuduDF = spark.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"xxxxxx.xxxxxxx.net").option('kudu.table',"impala::erqd.dim_address").load()
Getting below error
56 ERROR client.TabletClient: [Peer master-xxxxxx.xxxxxx.net:7051] Tablet server sent error Not authorized: unauthorized access to method: ConnectToMaster
Can you please help here
Created 04-23-2019 05:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is your cluster Kerberos-enabled? If so, did you kinit before running the job? Try a local driver before trying a distributed driver to rule out keytab-related issues.
Created 04-23-2019 06:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes it is ,below are steps I run
step 1:kinit -kt xxxxx.keytab xxxxxx@QAxxxxx.NET
step 2:pyspark2 --packages org.apache.kudu:kudu-spark2_2.11:1.4.0
step 3:kuduDF = spark.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"xxxxxx.xxxxxxx.net").option('kudu.table',"impala::erqd.dim_address").load()
I am able to create table ,query all via impala shell but unable to connect when using pyspark2 or even scala,getting below error
Getting below error
56 ERROR client.TabletClient: [Peer master-xxxxxx.xxxxxx.net:7051] Tablet server sent error Not authorized: unauthorized access to method: ConnectToMaster
Created 04-25-2019 12:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's try to rule out various types of problems.
1. Are you able to read/write to Kerberos-enabled HDFS with PySpark? Is Kudu the only Kerberos-enabled service that is not working from within PySpark?
2. Have you checked to ensure that the Spark driver is running on the host and shell you kinited from instead of being started in a YARN container? If it's running in YARN you have to give YARN access to the keytab to run as.
3. Have you tried connecting to Kudu with the regular Spark shell? Does it work? For examples see https://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark
