Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark-csv support in HDP 2.3.2

avatar
Expert Contributor

Is spark-csv packages is not supported by HDP2.3.2? I am getting below error when I try to run spark-shell that spark-csv package is not supported.

[hdfs@sandbox root]$ spark-shell   --packages com.databricks:spark-csv_2.10:1.1.0  --master yarn-client --driver-memory 512m --executor-memory 512m
Ivy Default Cache set to: /home/hdfs/.ivy2/cache
The jars for the packages stored in: /home/hdfs/.ivy2/jars
:: loading settings :: url = jar:file:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
:: resolution report :: resolve 332ms :: artifacts dl 0ms
        :: modules in use:
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
        ---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
                module not found: com.databricks#spark-csv_2.10;1.1.0
        ==== local-m2-cache: tried
          file:/home/hdfs/.m2/repository/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom
          -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar:
          file:/home/hdfs/.m2/repository/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar
        ==== local-ivy-cache: tried
          /home/hdfs/.ivy2/local/com.databricks/spark-csv_2.10/1.1.0/ivys/ivy.xml
        ==== central: tried
          https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom
          -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar:
          https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar
        ==== spark-packages: tried
          http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0....
          -- artifact com.databricks#spark-csv_2.10;1.1.0!spark-csv_2.10.jar:
          http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0....
                ::::::::::::::::::::::::::::::::::::::::::::::
                ::          UNRESOLVED DEPENDENCIES         ::
                ::::::::::::::::::::::::::::::::::::::::::::::
                :: com.databricks#spark-csv_2.10;1.1.0: not found
                ::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS
        Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.pom (java.net.ConnectException: Connection refused)
        Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/spark-csv_2.10-1.1.0.jar (java.net.ConnectException: Connection refused)
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.databricks#spark-csv_2.10;1.1.0: not found]
        at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:995)
        at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:263)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:145)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/12/14 01:49:39 INFO Utils: Shutdown hook called
[hdfs@sandbox root]$

Would really appreciate your help.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Divya Gehlot

Server access error at url https://repo1.maven.org/maven2/com/databricks/spa... (java.net.ConnectException: Connection refused)

Please see those messages in your output.

The same statement worked for me in my sandbox HDP 2.3.2

Output attached.

sparkcsv.pdf

803-screen-shot-2015-12-13-at-100407-pm.png

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Divya Gehlot

Server access error at url https://repo1.maven.org/maven2/com/databricks/spa... (java.net.ConnectException: Connection refused)

Please see those messages in your output.

The same statement worked for me in my sandbox HDP 2.3.2

Output attached.

sparkcsv.pdf

803-screen-shot-2015-12-13-at-100407-pm.png

avatar
Expert Contributor
@Neeraj Sabharwal

Thanks alot for the prompt response.

I am using HDP2.3.2 Vmware version(Link) . Is there any workaround to make it work?

avatar
Expert Contributor

@Neeraj Sabharwal

I encountered the issue I had enabled Bridge network connection in my VMWare because of which it was not installing the spark-csv packages and I was getting (java.net.ConnectException: Connection refused) .

avatar

if its at networking, just download the JAR file yourself, and use the --jars option to add it to the classpath.

looks like it lives under https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.1.0/