Support Questions
Find answers, ask questions, and share your expertise

SparkR + LLAP in HDP 3.0


I’m upgrading one of our clusters right now to HDP 3.0 and the upgrade itself worked fine. After some struggles, I manage to get Spark to work with LLAP in both Java/Scala and Python. But I can’t find any good information about how to get R to work with spark and LLAP. Before the upgrade, R worked fine with spark and LLAP, and we got some code running in production right now that is using that. So we really need it to work in HDP 3.0 as well. According to the documentation on there isn’t even support for R. Am I missing something here or is R not supported anymore? (that would kind of ruin the day for me)

Before the upgrade, the following code worked without any problems.

Sys.setenv(HADOOP_CONF_DIR = "/etc/hadoop/conf")
Sys.setenv(HIVE_CONF_DIR = "/etc/hive/conf")
Sys.setenv("SPARKR_SUBMIT_ARGS"="--master yarn --deploy-mode client --executor-memory 2688M --jars /usr/hdp/ --driver-class-path /usr/hdp/ --conf spark.executor.extraClassPath=/usr/hdp/ --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true sparkr-shell") 
library SparkR, lib.loc = c(file.path(paste(Sys.getenv("SPARK_HOME"), "R", "lib", sep="/"))))
sparkR.session(appName = "SparkR-Test")
head(sql("select * from testtable"))

@Berry Österlund

Could you please give us some information on what is the error you are seeing post upgrade?


There are no direct errors. I just can’t get it to even try to use LLAP. So it tries to read the orc files directly from HDFS, and the error it gives me are saying that I don’t have HDFS permissions to access those files. And that is correct. I don’t and I shouldn’t have. It’s not needed if the LLAP integration is working (as it is with Python and Java/Scala).

What worries me is that in the documentation it says that it only supports Python and Java/Scala. Not a word about R.

; ;