Reply
New Contributor
Posts: 1
Registered: ‎08-26-2018

Not able to connect to Spark Cluster with sparklyr

[ Edited ]

Hi folks,

 

I am trying to connect to the spark cluster with sparklyr package on the kerberized CDH cluster. But, I am getting errors as follows:

 

> library(sparklyr)
> library(httr)
> > Sys.setenv(HADOOP_CONF_DIR = '/etc/hadoop/conf')
> Sys.setenv(YARN_CONF_DIR = '/etc/hadoop/conf')
> Sys.setenv(SPARK_HOME = '/opt/cloudera/parcels/SPARK2/lib/spark2')
> config <- spark_config()
> config$spark.driver.cores <- 4
> config$spark.executor.cores <- 4
> config$spark.executor.memory <- "4G"
> config$`spark.deploy.mode` <- "cluster"
> config$`spark.submit.deployMode` <- "cluster"
> config$sparklyr.yarn.cluster.accepted.timeout <- 180
> config$spark.yarn.principal = "oracle/cluster@KR.ORACLE.COM"
> config$spark.yarn.keytab = "/opt/oracle/bigdatasql/kerberos/oracle.keytab"
> > with_config(
+ config = c(
+ authenticate(user=":", password="", type="gssnegotiate"),
+ use_proxy("")+ ),
+ sc <- spark_connect(master = "yarn-cluster",
+ config = config,
+ app_name = "sparklyr",
+ method = "shell",
+ version = "2.3.0")
+ )
Error in force(code) : Failed while connecting to sparklyr to port (8880) and address (bda1node02.kr.oracle.com) for sessionid (14909): Gateway in bda1node02.kr.oracle.com:8880 did not respond. Path: /opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/bin/spark-submit Parameters: --deploy-mode, cluster, --master, yarn, --class, sparklyr.Shell, '/usr/lib64/R/library/sparklyr/java/sparklyr-2.3-2.11.jar', 8880, 14909, --remote Log: /tmp/RtmpsEh3DS/file1ba7610e1272_spark.log ---- Output Log ---- ---- Error Log ----
In addition: Warning messages:
1: In value[[3L]](cond) : Failed to open bda1node03.kr.oracle.com:8090/ws/v1/cluster/info. Error in curl::curl_fetch_memory(url, handle = handle): Failed connect to bda1node03.kr.oracle.com:8090; Operation now in progress 2: In value[[3L]](cond) : Failed to open bda1node04.kr.oracle.com:8090/ws/v1/cluster/info. Error in curl::curl_fetch_memory(url, handle = handle): Failed connect to bda1node04.kr.oracle.com:8090; Operation now in progress 3: In doTryCatch(return(expr), name, parentenv, handler) : Failed to open bda1node03.kr.oracle.com:8033/ws/v1/cluster/info with status 404.

4:
In doTryCatch(return(expr), name, parentenv, handler) : Failed to open bda1node04.kr.oracle.com:8033/ws/v1/cluster/info with status 404.

 

Anyone has a success with sparklyr? I am using sparklyr 0.9.1 now, but the package version does not matter with this issue.

 

Regards,

Sean

Announcements