- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
failure on spark_connect() for sparklyr R package use on a Cloudera CDH 5.10.0 Hadoop cluster
- Labels:
-
Apache Spark
Created on ‎03-19-2017 10:05 PM - edited ‎09-16-2022 04:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
%%%%%%%%%%%%%
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_51...
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-shell
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark
[1] "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41"
- Ron Taylor
Pacific Northwest National Laboratory
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Type 'q()' to quit R.
character(0)
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
--- Please select a CRAN mirror for use in this session ---
trying URL 'https://cran.cnr.berkeley.edu/src/contrib/sparklyr_0.5.2.tar.gz'
Content type 'application/x-gzip' length 732806 bytes (715 KB)
==================================================
downloaded 715 KB
- installing source package ‘sparklyr’ ...
** package ‘sparklyr’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded - DONE (sparklyr)
‘/tmp/RtmpQUB4IE/downloaded_packages’
[1] "sparklyr" "stats" "graphics" "grDevices" "utils" "datasets"
[7] "methods" "base"
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Workstation release 6.4 (Santiago)
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
[1] stats graphics grDevices utils datasets methods base
[1] sparklyr_0.5.1
[1] Rcpp_0.12.9 withr_1.0.2 digest_0.6.12 dplyr_0.5.0
[5] rprojroot_1.2 assertthat_0.1 rappdirs_0.3.1 R6_2.2.0
[9] jsonlite_1.2 DBI_0.5-1 backports_1.0.5 magrittr_1.5
[13] httr_1.2.1 config_0.2 tools_3.3.2 parallel_3.3.2
[17] yaml_2.1.14 base64enc_0.1-3 tcltk_3.3.2 tibble_1.2
[1] "/usr/java/latest"
spark hadoop dir
1 1.6.2 2.6 spark-1.6.2-bin-hadoop2.6
[1] "/people/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6"
[1] "/share/apps/R/3.3.2/lib64/R"
[1] "/people/rtaylor"
[1] "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41"
[1] "/people/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6"
character(0)
[1] "config"
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (2423): Gateway in port (8880) did not respond.
Path: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-submit
Parameters: --class, sparklyr.Backend, --jars, '/share/apps/R/3.3.2/lib64/R/library/sparklyr/java/spark-csv_2.11-1.3.0.jar','/share/apps/R/3.3.2/lib64/R/library/sparklyr/java/commons-csv-1.1.jar','/share/apps/R/3.3.2/lib64/R/library/sparklyr/java/univocity-parsers-1.5.1.jar', '/share/apps/R/3.3.2/lib64/R/library/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 2423
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: No such file or directory
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: exec: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: cannot execute: No such file or directory
From: Ronald Taylor
Date: Sun, Mar 19, 2017 at 5:23 PM
Subject: Re: [rstudio/sparklyr] problem with spark_connect() using sparklyr on a Cloudera CDH 5.10.0 Hadoop cluster (#534)
To: rstudio/sparklyr <reply+00525e71db98e9f7adda4ea9e8bcc457e983afabaf2725ba92cf0000000114e3a7e692a169ce0ca76ebc@reply.git...>
Hi Aki,
Thanks for the guidance, but I still cannot get spark_connect() to work. Very disappointing.
You can see the screen output for my connect attempts below. Also, I checked out the Cloudera web page that you listed - but I don't see anything there that usefully supplements your email to me.
And so I am still stuck. Can you (or anbody else on the list) think of of anything else I can try? Spark 1.6.0 is running fine on the Cloudera cluster that I am trying to use, according to the Cloudera Manager. So the spark_connect() *should* work, but is not.
Screen output:
>
> config <- spark_config()
>
> spark_home <- "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41"
>
> spark_home
[1] "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41"
>
>
> spark_version <- "1.6.0"
>
> spark_version
[1] "1.6.0"
>
> sc <- spark_connect(master = "yarn-client", config = config, version = spark_version, spark_home=spark_home)
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (8451): Gateway in port (8880) did not respond.
Path: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-submit
Parameters: --class, sparklyr.Backend, --jars, '/people/rtaylor/Rpackages/sparklyr/java/spark-csv_2.11-1.3.0.jar','/people/rtaylor/Rpackages/sparklyr/java/commons-csv-1.1.jar','/people/rtaylor/Rpackages/sparklyr/java/univocity-parsers-1.5.1.jar', '/people/rtaylor/Rpackages/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 8451
---- Output Log ----
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: No such file or directory
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: exec: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: cannot execute: No such file or directory
---- Error Log ----
>
> config$spark.driver.cores<- 4
>
> config$spark.executor.cores<- 4
>
> config$executor.memory <- "4G"
>
> config
$sparklyr.cores.local
[1] 16
$spark.sql.shuffle.partitions.local
[1] 16
$spark.env.SPARK_LOCAL_IP.local
[1] "127.0.0.1"
$sparklyr.csv.embedded
[1] "^1.*"
$`sparklyr.shell.driver-class-path`
[1] ""
$spark.driver.cores
[1] 4
$spark.executor.cores
[1] 4
$executor.memory
[1] "4G"
attr(,"config")
[1] "default"
attr(,"file")
[1] "/people/rtaylor/Rpackages/sparklyr/conf/config-template.yml"
>
>
> spark_version
[1] "1.6.0"
>
> spark_home
[1] "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41"
>
> spark_connect(master = "yarn-client", config = config, version = spark_version, spark_home=spark_home)
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (533): Gateway in port (8880) did not respond.
Path: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-submit
Parameters: --class, sparklyr.Backend, --jars, '/people/rtaylor/Rpackages/sparklyr/java/spark-csv_2.11-1.3.0.jar','/people/rtaylor/Rpackages/sparklyr/java/commons-csv-1.1.jar','/people/rtaylor/Rpackages/sparklyr/java/univocity-parsers-1.5.1.jar', '/people/rtaylor/Rpackages/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 533
---- Output Log ----
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: No such file or directory
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: exec: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: cannot execute: No such file or directory
---- Error Log ----
>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I ran Spark 1.6.2 on CDH 5.10 as follows:
config <- spark_config() config$spark.driver.cores <- 4 config$spark.executor.cores <- 4 config$spark.executor.memory <- "4G" spark_home <- "/opt/cloudera/parcels/CDH/lib/spark" spark_version <- "1.6.2" #spark_home <- "/opt/cloudera/parcels/SPARK2/lib/spark2" #spark_version <- "2.0.0" sc <- spark_connect(master="yarn-client", version=spark_version, config=config, spark_home=spark_home)—
Created ‎03-25-2017 06:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I'm Aki. Thanks for your posting.
According to your output message, I found your spark_home setting is wrong.
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/spark/bin/spark-submit: line 27: /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/spark-class: No such file or directory
`spark-class` should found in `/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/bin/spark-class`. So, you should set
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark
for `spark_home` not
/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib
, or you can use just
/opt/cloudera/parcels/CDH/lib/spark
So, you can set an environmental variable as follows:
Sys.setenv(SPARK_HOME = "/opt/cloudera/parcels/CDH/lib/spark")
or, could you try to copy and paste following code? As far as I saw your output, it seems you didn't try my code. I don't know the difference between patch version of spark would affect to connection, but I guess you may be able to replace spark_version as "1.6.0".
config <- spark_config() config$spark.driver.cores <- 4 config$spark.executor.cores <- 4 config$spark.executor.memory <- "4G" spark_home <- "/opt/cloudera/parcels/CDH/lib/spark" spark_version <- "1.6.2" sc <- spark_connect(master="yarn-client", version=spark_version, config=config, spark_home=spark_home)
