Archives of Support Questions (Read Only)

rbr · ‎11-29-2016

Hello everyone,

I want to use R in Zeppelin Notebook. This is my environment:

1 Server (single node cluster)
SUSE Linux Enterprise Server 12 SP1
HDP 2.5.0.0-1245 (installed and managed with Ambari 2.4.1.0)
JDK 1.8.0 (102)
Python 2.7.9
Spark 1.6.x.2.5 (installed and managed with Ambari)
Zeppelin 0.6.0.2.5.0.0-1245 (installed and managed with Ambari)
R 3.3.0 Patched (2016-05-26 r70684)

I followed the instruction from this site (https://zeppelin.apache.org/docs/0.6.0/interpreter/r.html) to install R-*, libcurl-devel and openssl-devel. I also installed the recommended R packages.

The following interpreters are available in my installation of Zeppelin Notebook (http://host:9995/#/interpreter):

AngularJS (%angular)
JDBC (%jdbc)
Livy (%livy, %livy.pyspark, %livy.sparkr, %livy.sql)
MarkDown (%md)
Shell (%sh)
Spark (%spark, %spark.pyspark, %spark.sql, %spark.dep)

As far as I understand I cannot add a new interpreter with the prefix "%r" to use R in Zeppelin Notebook. Instead I have to use Livy and the prefix "%livy.sparkr", right? One server instance of Livy is running and i can successfully execute commands like

%livy.spark
sc.version

or

%livy.pyspark
print "1"

but when I try to run a simple R command like

%livy.sparkr
hello <- function( name ) {
  sprintf( "Hello, %s", name );
}
hello("livy")

I'm getting the following error message:

Error with 400 StatusCode: "requirement failed: sparkr.zip not found; cannot run sparkr application."

Did I miss something? Is there an error in my setup or environment? Any help would be appreciated.

Thank you in advance.

rbr · ‎11-30-2016

After further investigation I found the solution.

The installation of Spark was missing the archive "sparkr.zip". This archive should be in the folder "SPARK_HOME/R/lib". I downloaded a pre-build version of Spark for Hadoop 2.6 from http://spark.apache.org/downloads.html. The download (spark-1.6.2-bin-hadoop2.6.tgz) contains the folder "R/lib" and it contains the archive "sparkr.zip". I moved the folder into the installation path of Spark on my HDP cluster. After restarting the Spark service I was able to use the prefix "%livy.sparkr" in Zeppelin Notebook.

View solution in original post

jzhang · ‎11-29-2016

do you use yarn-cluster mode ?

rbr · ‎11-29-2016

I made no specific configurations in that area. How can I check if Spark is running in yarn-cluster mode and what does it mean?

The zeppelin-env.sh file contains the line:

 export MASTER=yarn-client

lgeorge · ‎11-29-2016

It looks like you have R installed; is it on all nodes in your cluster? There is also a requirement to set JAVA_HOME.

If you have access to Spark directly you might want to try accessing R from Spark first, to help isolate the issue.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_spark-r....

rbr · ‎11-30-2016

After further investigation I found the solution.

The installation of Spark was missing the archive "sparkr.zip". This archive should be in the folder "SPARK_HOME/R/lib". I downloaded a pre-build version of Spark for Hadoop 2.6 from http://spark.apache.org/downloads.html. The download (spark-1.6.2-bin-hadoop2.6.tgz) contains the folder "R/lib" and it contains the archive "sparkr.zip". I moved the folder into the installation path of Spark on my HDP cluster. After restarting the Spark service I was able to use the prefix "%livy.sparkr" in Zeppelin Notebook.

Cloudera Community

Archives of Support Questions (Read Only)

HDP, Zeppelin Notebook, Livy, Spark & R: "requirement failed: sparkr.zip not found; cannot run sparkr application."