Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

HDP, Zeppelin Notebook, Livy, Spark & R: "requirement failed: sparkr.zip not found; cannot run sparkr application."

avatar
New Member

Hello everyone,

I want to use R in Zeppelin Notebook. This is my environment:

  • 1 Server (single node cluster)
  • SUSE Linux Enterprise Server 12 SP1
  • HDP 2.5.0.0-1245 (installed and managed with Ambari 2.4.1.0)
  • JDK 1.8.0 (102)
  • Python 2.7.9
  • Spark 1.6.x.2.5 (installed and managed with Ambari)
  • Zeppelin 0.6.0.2.5.0.0-1245 (installed and managed with Ambari)
  • R 3.3.0 Patched (2016-05-26 r70684)

I followed the instruction from this site (https://zeppelin.apache.org/docs/0.6.0/interpreter/r.html) to install R-*, libcurl-devel and openssl-devel. I also installed the recommended R packages.

The following interpreters are available in my installation of Zeppelin Notebook (http://host:9995/#/interpreter):

  • AngularJS (%angular)
  • JDBC (%jdbc)
  • Livy (%livy, %livy.pyspark, %livy.sparkr, %livy.sql)
  • MarkDown (%md)
  • Shell (%sh)
  • Spark (%spark, %spark.pyspark, %spark.sql, %spark.dep)

As far as I understand I cannot add a new interpreter with the prefix "%r" to use R in Zeppelin Notebook. Instead I have to use Livy and the prefix "%livy.sparkr", right? One server instance of Livy is running and i can successfully execute commands like

%livy.spark
sc.version

or

%livy.pyspark
print "1"

but when I try to run a simple R command like

%livy.sparkr
hello <- function( name ) {
  sprintf( "Hello, %s", name );
}
hello("livy")

I'm getting the following error message:

Error with 400 StatusCode: "requirement failed: sparkr.zip not found; cannot run sparkr application."

Did I miss something? Is there an error in my setup or environment? Any help would be appreciated.

Thank you in advance.

1 ACCEPTED SOLUTION

avatar
New Member

After further investigation I found the solution.

The installation of Spark was missing the archive "sparkr.zip". This archive should be in the folder "SPARK_HOME/R/lib". I downloaded a pre-build version of Spark for Hadoop 2.6 from http://spark.apache.org/downloads.html. The download (spark-1.6.2-bin-hadoop2.6.tgz) contains the folder "R/lib" and it contains the archive "sparkr.zip". I moved the folder into the installation path of Spark on my HDP cluster. After restarting the Spark service I was able to use the prefix "%livy.sparkr" in Zeppelin Notebook.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

do you use yarn-cluster mode ?

avatar
New Member

I made no specific configurations in that area. How can I check if Spark is running in yarn-cluster mode and what does it mean?

The zeppelin-env.sh file contains the line:

 export MASTER=yarn-client

avatar
Super Collaborator

It looks like you have R installed; is it on all nodes in your cluster? There is also a requirement to set JAVA_HOME.

If you have access to Spark directly you might want to try accessing R from Spark first, to help isolate the issue.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_spark-r....

avatar
New Member

After further investigation I found the solution.

The installation of Spark was missing the archive "sparkr.zip". This archive should be in the folder "SPARK_HOME/R/lib". I downloaded a pre-build version of Spark for Hadoop 2.6 from http://spark.apache.org/downloads.html. The download (spark-1.6.2-bin-hadoop2.6.tgz) contains the folder "R/lib" and it contains the archive "sparkr.zip". I moved the folder into the installation path of Spark on my HDP cluster. After restarting the Spark service I was able to use the prefix "%livy.sparkr" in Zeppelin Notebook.