Archives of Support Questions (Read Only)

lukas_mueller02 · ‎08-11-2017

I'm new to spark and try to load a csv File in a Spark DataFrame using the Databricks Spark-CSV jar. I started a python script with

/usr/hdp/current/spark-client/bin/spark-submit ~/main.py

I got a java.lang.ClassNotFoundException for com.databricks.spark.csv which makes sense due to the fact that I haven't got the .jar in my path. Can anybody tell me how to add .jar files to my system for the use with PySpark?

dzaratsian · ‎08-11-2017

Hi @Lukas Müller, does this work for you?

./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py

What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.

View solution in original post

dzaratsian · ‎08-11-2017

Hi @Lukas Müller, does this work for you?

./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py

What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.

lukas_mueller02 · ‎08-11-2017

I'm using Spark 1.6.3

That did the trick!

dzaratsian · ‎08-11-2017

@Lukas Müller

Great, happy you got it working!

Cloudera Community

Archives of Support Questions (Read Only)

Installation and usage of Databricks Spark-CSV with Python