Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Installation and usage of Databricks Spark-CSV with Python

Solved Go to solution
Highlighted

Installation and usage of Databricks Spark-CSV with Python

New Contributor

I'm new to spark and try to load a csv File in a Spark DataFrame using the Databricks Spark-CSV jar. I started a python script with

/usr/hdp/current/spark-client/bin/spark-submit ~/main.py 

I got a java.lang.ClassNotFoundException for com.databricks.spark.csv which makes sense due to the fact that I haven't got the .jar in my path. Can anybody tell me how to add .jar files to my system for the use with PySpark?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Installation and usage of Databricks Spark-CSV with Python

Hi @Lukas Müller, does this work for you?

./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py

What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.

3 REPLIES 3

Re: Installation and usage of Databricks Spark-CSV with Python

Hi @Lukas Müller, does this work for you?

./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py

What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.

Re: Installation and usage of Databricks Spark-CSV with Python

New Contributor

I'm using Spark 1.6.3

That did the trick!

Re: Installation and usage of Databricks Spark-CSV with Python

@Lukas Müller

Great, happy you got it working!

Don't have an account?
Coming from Hortonworks? Activate your account here