Support Questions

Find answers, ask questions, and share your expertise

sparkR with HDP 2.4 deployed on AWS ec2 connection error

avatar
Contributor

I am trying to use SparkR in Rstudio in HDP2.4 deployed on AWS EC2 cluster. I installed R, Rstudio and other R packages but after I login to R and try to start spark context, I encountered the problem below.

Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths()))

library(SparkR)

# Create a spark context and a SQL context
sc <- SparkR::sparkR.init(master = "yarn-client")

Retrying connect to server: ip-xxx-xx-xx-xx.ec2.internal/xxx.xx.xx.xx:8050. Already tried 49 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
16/09/09 11:41:32 
1 ACCEPTED SOLUTION

avatar
Super Guru

@Fish Berh

I assume that your R-Studio is on your laptop. It seems that you are trying to access an internal IP from your laptop. You need to reference a public IP or the public URI for your server.

View solution in original post

5 REPLIES 5

avatar
Super Guru

@Fish Berh

I assume that your R-Studio is on your laptop. It seems that you are trying to access an internal IP from your laptop. You need to reference a public IP or the public URI for your server.

avatar
Super Guru

@Fish Berh

You may also want to check a few options here:

https://blog.rstudio.org/tag/sparkr/

avatar
Contributor

Actually, I logged in to Rstudio using the address of the server:

ec2-yy-yyy-yyy-yy.compute-1.amazonaws.com:8787

avatar
Super Guru

@Fish Berh

Could you vote and accept my response? I suggested using the public URI of the server.

avatar
Contributor

I used public URI of the server to login and got the error.