Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

spark-csv on spark 1.6

Explorer

Hi,

I need to use spark-csv on spark 1.6, any one have an idea if I need to copy the spark-csv jar on all spark node and where.

I have to use zeeplkin and use livy interpretor

5 REPLIES 5

Expert Contributor

@Boualem SAOULA Here's how you can add it so you can work with it: https://github.com/databricks/spark-csv

The --packages will also work with 1.6.

Explorer

Thanks Matt, but my servers cannot access to the internet,

I donwload the sapark-csv jar and copy it to spark server, I'm looking to know how does spark locate the folder which contain the saprk-csv JAR !!!

Contributor

You can choose to either compile the package into your application jar, or manually install it on every spark/yarn worker node and include the dir in your <extraClassPath>.

Sample pom.xml on HDP 2.6.3:

                <dependency>
                    <groupId>org.apache.spark</groupId>
                    <artifactId>spark-core_2.10</artifactId>
                    <version>1.6.3.2.6.3.0-235</version>
                    <scope>provided</scope>
                </dependency>
...
                <dependency>
                    <groupId>com.databricks</groupId>
                    <artifactId>spark-csv_2.10</artifactId>
                    <version>1.5.0</version>
                    <scope>provided</scope>
                </dependency>

Use "

<scope>provided</scope>

" if you choose external installation. Leave out if you want to compile in. Simpler to compile in, but if you have a large cluster or multiple Spark applications that will share such external libraries, using "provided" scope may be more optimal. In this case, you would need to specify:

--conf "spark.driver.extraClassPath=...:<your ext lib path>/*" --conf "spark.executor.extraClassPath=...:<your ext lib path>/*"

on your spark-submit command line.

Expert Contributor

@Boualem SAOULA I agree with what @Miles Yao. If you wanted a quick method to test or just add some jars quickly there is also a spark-submit parameter --jars that takes a comma separated list of (full path to) Jars. But it ships the jars every time so that's why the method @Miles Yao suggested has some extra benefit as you save on network traffic.

Explorer

My solution is to add spark.jars Property to spark 1.6 config.

spark.jars='path-to-jar' (you can use any path)

and I copy spark-csv jar (spark-csv_2.10-1.5.0.jar) and its dependancy (commons-csv-1.1.jar, univocity-parsers-1.5.1.jar) to the path-to-jar

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.