Support Questions
Find answers, ask questions, and share your expertise

Spark-csv support in HDP 2.3.2

Rising Star

Is spark-csv packages is not supported by HDP2.3.2? I am getting below error when I try to run spark-shell that spark-csv package is not supported.

[hdfs@sandbox root]$ spark-shell   --packages com.databricks:spark-csv_2.10:1.1.0  --master yarn-client --driver-memory 512m --executor-memory 512m
Ivy Default Cache set to: /home/hdfs/.ivy2/cache
The jars for the packages stored in: /home/hdfs/.ivy2/jars
:: loading settings :: url = jar:file:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
:: resolution report :: resolve 332ms :: artifacts dl 0ms
        :: modules in use:
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
        ---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
                module not found: com.databricks#spark-csv_2.10;1.1.0
        ==== local-m2-cache: tried
          file:/home/hdfs/.m2/repository/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom
          -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar:
          file:/home/hdfs/.m2/repository/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar
        ==== local-ivy-cache: tried
          /home/hdfs/.ivy2/local/com.databricks/spark-csv_2.10/1.1.0/ivys/ivy.xml
        ==== central: tried
          https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom
          -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar:
          https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar
        ==== spark-packages: tried
          http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0....
          -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar:
          http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0....
                ::::::::::::::::::::::::::::::::::::::::::::::
                ::          UNRESOLVED DEPENDENCIES         ::
                ::::::::::::::::::::::::::::::::::::::::::::::
                :: com.databricks#spark-csv_2.10;1.1.0: not found
                ::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS
        Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom (java.net.ConnectException: Connection refused)
        Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar (java.net.ConnectException: Connection refused)
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.databricks#spark-csv_2.10;1.1.0: not found]
        at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:995)
        at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:263)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:145)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/12/14 01:49:39 INFO Utils: Shutdown hook called
[hdfs@sandbox root]$

Would really appreciate your help.

1 ACCEPTED SOLUTION

Accepted Solutions

@Divya Gehlot

Server access error at url https://repo1.maven.org/maven2/com/databricks/spa... (java.net.ConnectException: Connection refused)

Please see those messages in your output.

The same statement worked for me in my sandbox HDP 2.3.2

Output attached.

sparkcsv.pdf

803-screen-shot-2015-12-13-at-100407-pm.png

View solution in original post

4 REPLIES 4

@Divya Gehlot

Server access error at url https://repo1.maven.org/maven2/com/databricks/spa... (java.net.ConnectException: Connection refused)

Please see those messages in your output.

The same statement worked for me in my sandbox HDP 2.3.2

Output attached.

sparkcsv.pdf

803-screen-shot-2015-12-13-at-100407-pm.png

View solution in original post

Rising Star
@Neeraj Sabharwal

Thanks alot for the prompt response.

I am using HDP2.3.2 Vmware version(Link) . Is there any workaround to make it work?

Rising Star

@Neeraj Sabharwal

I encountered the issue I had enabled Bridge network connection in my VMWare because of which it was not installing the spark-csv packages and I was getting (java.net.ConnectException: Connection refused) .

if its at networking, just download the JAR file yourself, and use the --jars option to add it to the classpath.

looks like it lives under https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/