I need to use spark-csv on spark 1.6, any one have an idea if I need to copy the spark-csv jar on all spark node and where.
I have to use zeeplkin and use livy interpretor
Thanks Matt, but my servers cannot access to the internet,
I donwload the sapark-csv jar and copy it to spark server, I'm looking to know how does spark locate the folder which contain the saprk-csv JAR !!!
You can choose to either compile the package into your application jar, or manually install it on every spark/yarn worker node and include the dir in your <extraClassPath>.
Sample pom.xml on HDP 2.6.3:
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>22.214.171.124.6.3.0-235</version> <scope>provided</scope> </dependency> ... <dependency> <groupId>com.databricks</groupId> <artifactId>spark-csv_2.10</artifactId> <version>1.5.0</version> <scope>provided</scope> </dependency>
" if you choose external installation. Leave out if you want to compile in. Simpler to compile in, but if you have a large cluster or multiple Spark applications that will share such external libraries, using "provided" scope may be more optimal. In this case, you would need to specify:
--conf "spark.driver.extraClassPath=...:<your ext lib path>/*" --conf "spark.executor.extraClassPath=...:<your ext lib path>/*"
on your spark-submit command line.
@Boualem SAOULA I agree with what @Miles Yao. If you wanted a quick method to test or just add some jars quickly there is also a spark-submit parameter --jars that takes a comma separated list of (full path to) Jars. But it ships the jars every time so that's why the method @Miles Yao suggested has some extra benefit as you save on network traffic.
My solution is to add spark.jars Property to spark 1.6 config.
spark.jars='path-to-jar' (you can use any path)
and I copy spark-csv jar (spark-csv_2.10-1.5.0.jar) and its dependancy (commons-csv-1.1.jar, univocity-parsers-1.5.1.jar) to the path-to-jar