Created 08-11-2017 05:53 PM
I'm new to spark and try to load a csv File in a Spark DataFrame using the Databricks Spark-CSV jar. I started a python script with
/usr/hdp/current/spark-client/bin/spark-submit ~/main.py
I got a java.lang.ClassNotFoundException for com.databricks.spark.csv which makes sense due to the fact that I haven't got the .jar in my path. Can anybody tell me how to add .jar files to my system for the use with PySpark?
Created 08-11-2017 06:04 PM
Hi @Lukas Müller, does this work for you?
./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py
What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.
Created 08-11-2017 06:04 PM
Hi @Lukas Müller, does this work for you?
./bin/spark-submit --jars /path/to/file1.jar,/path/to/file2.jar --packages com.databricks:spark-csv_2.x:x.x.x pyspark_code.py
What version of Spark are you using? If you are using Spark 2.0+, you should not need to specify this jar (just as an FYI for you). Please let me know if this helps.
Created 08-11-2017 06:22 PM
I'm using Spark 1.6.3
That did the trick!
Created 08-11-2017 06:28 PM
Great, happy you got it working!