Created 04-08-2018 11:38 AM
Hi,
I need to use spark-csv on spark 1.6, any one have an idea if I need to copy the spark-csv jar on all spark node and where.
I have to use zeeplkin and use livy interpretor
Created 04-08-2018 07:28 PM
@Boualem SAOULA Here's how you can add it so you can work with it: https://github.com/databricks/spark-csv
The --packages will also work with 1.6.
Created 04-09-2018 12:14 PM
Thanks Matt, but my servers cannot access to the internet,
I donwload the sapark-csv jar and copy it to spark server, I'm looking to know how does spark locate the folder which contain the saprk-csv JAR !!!
Created 04-09-2018 03:58 PM
You can choose to either compile the package into your application jar, or manually install it on every spark/yarn worker node and include the dir in your <extraClassPath>.
Sample pom.xml on HDP 2.6.3:
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.3.2.6.3.0-235</version> <scope>provided</scope> </dependency> ... <dependency> <groupId>com.databricks</groupId> <artifactId>spark-csv_2.10</artifactId> <version>1.5.0</version> <scope>provided</scope> </dependency>
Use "
<scope>provided</scope>
" if you choose external installation. Leave out if you want to compile in. Simpler to compile in, but if you have a large cluster or multiple Spark applications that will share such external libraries, using "provided" scope may be more optimal. In this case, you would need to specify:
--conf "spark.driver.extraClassPath=...:<your ext lib path>/*" --conf "spark.executor.extraClassPath=...:<your ext lib path>/*"
on your spark-submit command line.
Created 04-10-2018 10:12 AM
@Boualem SAOULA I agree with what @Miles Yao. If you wanted a quick method to test or just add some jars quickly there is also a spark-submit parameter --jars that takes a comma separated list of (full path to) Jars. But it ships the jars every time so that's why the method @Miles Yao suggested has some extra benefit as you save on network traffic.
Created 04-10-2018 09:49 PM
My solution is to add spark.jars Property to spark 1.6 config.
spark.jars='path-to-jar' (you can use any path)
and I copy spark-csv jar (spark-csv_2.10-1.5.0.jar) and its dependancy (commons-csv-1.1.jar, univocity-parsers-1.5.1.jar) to the path-to-jar