Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Installation and usage of Databricks Spark-CSV with Python

avatar
Contributor

I'm new to spark and try to load a csv File in a Spark DataFrame using the Databricks Spark-CSV jar. I started a python script with

/usr/hdp/current/spark-client/bin/spark-submit ~/main.py 

I got a java.lang.ClassNotFoundException for com.databricks.spark.csv which makes sense due to the fact that I haven't got the .jar in my path. Can anybody tell me how to add .jar files to my system for the use with PySpark?

1 ACCEPTED SOLUTION

avatar

Hi @Lukas Müller, does this work for you?

./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py

What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.

View solution in original post

3 REPLIES 3

avatar

Hi @Lukas Müller, does this work for you?

./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py

What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.

avatar
Contributor

I'm using Spark 1.6.3

That did the trick!

avatar

@Lukas Müller

Great, happy you got it working!