Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

I can not write in KUDU table using Pyspark

avatar
New Contributor

Error

Exception in thread "main" org.apache.spark.SparkException: No main class set in JAR; please specify one with --class
        at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
        at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:266)
        at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:251)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:120)
        at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:913)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:913)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Kudu table

Moawad_0-1647357036451.png

 

Pyspark code

import os
os.environ['PYSPARK_PYTHON']="/u01/shared/tools/envs/tensor_2_1/bin/python3.6"
os.environ['PYSPARK_DRIVER_PYTHON']="/u01/shared/tools/envs/tensor_2_1/bin/python3.6"
os.environ['PYSPARK_SUBMIT_ARGS'] = "/home/v22fingerprintbda/FPTeam/Streams/kudu-spark_2.10-1.5.0.jar pyspark-shell" 

import time 
import findspark
findspark.init('/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1580995/lib/spark')
from pyspark import SparkContext, SQLContext, StorageLevel
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.functions import col,isnan,when,count

spark = SparkSession.builder.master("local").appName("MEDReader").getOrCreate()

sd = [("1", "Ahmed"),
      ("2", "Emad")]

sch = ["id", "name"]

kududf = spark.createDataFrame(data=sd, schema=sch)

#print("Starting KUDU .......")

# Create a table on KUDU
kududf.write \
      .format("org.apache.kudu.spark.kudu") \
      .option('kudu.master',kuduMaster)\
      .option('kudu.table',"impala::bde.FP_KUDU_TEST") \
      .mode("append") \
      .save()

Additional info spark version 2.4.0-cdh6.2.1 kudu 1.9.0-cdh6.2.1

 

2 REPLIES 2

avatar
Super Collaborator

Hello @Moawad 

 

Thanks for using Cloudera Community. Based on the Post, Your Team is having issues connecting Kudu via pySpark.

 

Kindly confirm whether a Simple Example [1] as documented in CDH 6.2.x Guide works for your Team. 

 

Regards, Smarak

 

[1] https://docs.cloudera.com/documentation/enterprise/6/6.2/topics/kudu_development.html

[2] https://kudu.apache.org/docs/developing.html

 

 

avatar
Super Collaborator

Hello @Moawad 

 

Hope you are doing well. Kindly let us know if the Post on 03/20 documenting few Links from CDH v6.x helped your Team.

 

Regards, Smarak