Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Installation and usage of Databricks Spark-CSV with Python

avatar

I'm new to spark and try to load a csv File in a Spark DataFrame using the Databricks Spark-CSV jar. I started a python script with

/usr/hdp/current/spark-client/bin/spark-submit ~/main.py 

I got a java.lang.ClassNotFoundException for com.databricks.spark.csv which makes sense due to the fact that I haven't got the .jar in my path. Can anybody tell me how to add .jar files to my system for the use with PySpark?

1 ACCEPTED SOLUTION

avatar

Hi @Lukas Müller, does this work for you?

./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py

What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.

View solution in original post

3 REPLIES 3

avatar

Hi @Lukas Müller, does this work for you?

./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py

What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.

avatar

I'm using Spark 1.6.3

That did the trick!

avatar

@Lukas Müller

Great, happy you got it working!