Support Questions

Find answers, ask questions, and share your expertise

HDP, Zeppelin Notebook, Livy, Spark & R: "requirement failed: sparkr.zip not found; cannot run sparkr application."

avatar
New Contributor

Hello everyone,

I want to use R in Zeppelin Notebook. This is my environment:

  • 1 Server (single node cluster)
  • SUSE Linux Enterprise Server 12 SP1
  • HDP 2.5.0.0-1245 (installed and managed with Ambari 2.4.1.0)
  • JDK 1.8.0 (102)
  • Python 2.7.9
  • Spark 1.6.x.2.5 (installed and managed with Ambari)
  • Zeppelin 0.6.0.2.5.0.0-1245 (installed and managed with Ambari)
  • R 3.3.0 Patched (2016-05-26 r70684)

I followed the instruction from this site (https://zeppelin.apache.org/docs/0.6.0/interpreter/r.html) to install R-*, libcurl-devel and openssl-devel. I also installed the recommended R packages.

The following interpreters are available in my installation of Zeppelin Notebook (http://host:9995/#/interpreter):

  • AngularJS (%angular)
  • JDBC (%jdbc)
  • Livy (%livy, %livy.pyspark, %livy.sparkr, %livy.sql)
  • MarkDown (%md)
  • Shell (%sh)
  • Spark (%spark, %spark.pyspark, %spark.sql, %spark.dep)

As far as I understand I cannot add a new interpreter with the prefix "%r" to use R in Zeppelin Notebook. Instead I have to use Livy and the prefix "%livy.sparkr", right? One server instance of Livy is running and i can successfully execute commands like

%livy.spark
sc.version

or

%livy.pyspark
print "1"

but when I try to run a simple R command like

%livy.sparkr
hello <- function( name ) {
  sprintf( "Hello, %s", name );
}
hello("livy")

I'm getting the following error message:

Error with 400 StatusCode: "requirement failed: sparkr.zip not found; cannot run sparkr application."

Did I miss something? Is there an error in my setup or environment? Any help would be appreciated.

Thank you in advance.

1 ACCEPTED SOLUTION

avatar
New Contributor

After further investigation I found the solution.

The installation of Spark was missing the archive "sparkr.zip". This archive should be in the folder "SPARK_HOME/R/lib". I downloaded a pre-build version of Spark for Hadoop 2.6 from http://spark.apache.org/downloads.html. The download (spark-1.6.2-bin-hadoop2.6.tgz) contains the folder "R/lib" and it contains the archive "sparkr.zip". I moved the folder into the installation path of Spark on my HDP cluster. After restarting the Spark service I was able to use the prefix "%livy.sparkr" in Zeppelin Notebook.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

do you use yarn-cluster mode ?

avatar
New Contributor

I made no specific configurations in that area. How can I check if Spark is running in yarn-cluster mode and what does it mean?

The zeppelin-env.sh file contains the line:

 export MASTER=yarn-client

avatar
Super Collaborator

It looks like you have R installed; is it on all nodes in your cluster? There is also a requirement to set JAVA_HOME.

If you have access to Spark directly you might want to try accessing R from Spark first, to help isolate the issue.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_spark-r....

avatar
New Contributor

After further investigation I found the solution.

The installation of Spark was missing the archive "sparkr.zip". This archive should be in the folder "SPARK_HOME/R/lib". I downloaded a pre-build version of Spark for Hadoop 2.6 from http://spark.apache.org/downloads.html. The download (spark-1.6.2-bin-hadoop2.6.tgz) contains the folder "R/lib" and it contains the archive "sparkr.zip". I moved the folder into the installation path of Spark on my HDP cluster. After restarting the Spark service I was able to use the prefix "%livy.sparkr" in Zeppelin Notebook.