Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

sparkR with HDP 2.4 deployed on AWS ec2 connection error

I am trying to use SparkR in Rstudio in HDP2.4 deployed on AWS EC2 cluster. I installed R, Rstudio and other R packages but after I login to R and try to start spark context, I encountered the problem below.

Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths()))

library(SparkR)

# Create a spark context and a SQL context
sc <- SparkR::sparkR.init(master = "yarn-client")

Retrying connect to server: ip-xxx-xx-xx-xx.ec2.internal/xxx.xx.xx.xx:8050. Already tried 49 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
16/09/09 11:41:32 
1 ACCEPTED SOLUTION

@Fish Berh

I assume that your R-Studio is on your laptop. It seems that you are trying to access an internal IP from your laptop. You need to reference a public IP or the public URI for your server.

View solution in original post

5 REPLIES 5

@Fish Berh

I assume that your R-Studio is on your laptop. It seems that you are trying to access an internal IP from your laptop. You need to reference a public IP or the public URI for your server.

@Fish Berh

You may also want to check a few options here:

https://blog.rstudio.org/tag/sparkr/

Actually, I logged in to Rstudio using the address of the server:

ec2-yy-yyy-yyy-yy.compute-1.amazonaws.com:8787

@Fish Berh

Could you vote and accept my response? I suggested using the public URI of the server.

I used public URI of the server to login and got the error.