Created 11-29-2016 08:32 AM
Hello everyone,
I want to use R in Zeppelin Notebook. This is my environment:
I followed the instruction from this site (https://zeppelin.apache.org/docs/0.6.0/interpreter/r.html) to install R-*, libcurl-devel and openssl-devel. I also installed the recommended R packages.
The following interpreters are available in my installation of Zeppelin Notebook (http://host:9995/#/interpreter):
As far as I understand I cannot add a new interpreter with the prefix "%r" to use R in Zeppelin Notebook. Instead I have to use Livy and the prefix "%livy.sparkr", right? One server instance of Livy is running and i can successfully execute commands like
%livy.spark sc.version
or
%livy.pyspark print "1"
but when I try to run a simple R command like
%livy.sparkr hello <- function( name ) { sprintf( "Hello, %s", name ); } hello("livy")
I'm getting the following error message:
Error with 400 StatusCode: "requirement failed: sparkr.zip not found; cannot run sparkr application."
Did I miss something? Is there an error in my setup or environment? Any help would be appreciated.
Thank you in advance.
Created 11-30-2016 07:53 AM
After further investigation I found the solution.
The installation of Spark was missing the archive "sparkr.zip". This archive should be in the folder "SPARK_HOME/R/lib". I downloaded a pre-build version of Spark for Hadoop 2.6 from http://spark.apache.org/downloads.html. The download (spark-1.6.2-bin-hadoop2.6.tgz) contains the folder "R/lib" and it contains the archive "sparkr.zip". I moved the folder into the installation path of Spark on my HDP cluster. After restarting the Spark service I was able to use the prefix "%livy.sparkr" in Zeppelin Notebook.
Created 11-29-2016 09:21 AM
do you use yarn-cluster mode ?
Created 11-29-2016 09:29 AM
I made no specific configurations in that area. How can I check if Spark is running in yarn-cluster mode and what does it mean?
The zeppelin-env.sh file contains the line:
export MASTER=yarn-client
Created 11-29-2016 11:37 PM
It looks like you have R installed; is it on all nodes in your cluster? There is also a requirement to set JAVA_HOME.
If you have access to Spark directly you might want to try accessing R from Spark first, to help isolate the issue.
Created 11-30-2016 07:53 AM
After further investigation I found the solution.
The installation of Spark was missing the archive "sparkr.zip". This archive should be in the folder "SPARK_HOME/R/lib". I downloaded a pre-build version of Spark for Hadoop 2.6 from http://spark.apache.org/downloads.html. The download (spark-1.6.2-bin-hadoop2.6.tgz) contains the folder "R/lib" and it contains the archive "sparkr.zip". I moved the folder into the installation path of Spark on my HDP cluster. After restarting the Spark service I was able to use the prefix "%livy.sparkr" in Zeppelin Notebook.