Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Not able to create SparkR dataframe using read.df

avatar
Super Collaborator

Using Hortonworks Sandbox, I am setting up SparkR in both RStudio and Zeppelin. This below code works properly in RStudio and SparkR shell but not in Zeppelin, please have a look:

if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
  Sys.setenv(SPARK_HOME = "/usr/hdp/2.5.0.0-1245/spark")
}
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sc <- sparkR.init(master = "local[*]", sparkEnvir = list(spark.driver.memory="2g"),sparkPackages="com.databricks:spark-csv_2.10:1.4.0")
sqlContext <- sparkRSQL.init(sc)
train_df <- read.df(sqlContext,"/tmp/first_8.csv","csv", header = "true", inferSchema = "true")

But when I do this in Zeppelin using livy.spark interpreter, I get ClassNotFound Exception:

java.lang.ClassNotFoundException: Failed to find data source: csv. Please find packages at http://spark-packages.org

I am also importing the dependencies using dep interpreter -

%dep
z.reset()
z.load("com.databricks:spark-csv_2.10:1.4.0")

But this seems to make no impact I guess. I have also tried manually copying spark-csv_2.10-1.4.0.jar to /usr/hdp/2.5.0.0-1245/spark/lib, but it is not working. Has anyone experienced this before? Thanks in advance

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Got it working finally, thanks to @Robert Hryniewicz. Go to interpreter settings page and add the new property under livy settings - livy.spark.jars.packages and the value com.databricks:spark-csv_2.10:1.4.0. Restart the interpreter and retry the query.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

Please specify com.databricks:spark-csv_2.10:1.4.0 in the interpreter setting page

avatar
Super Collaborator

@jzhang,should I add it in the livy interpreter?

avatar
Super Collaborator

I tried that, but it didn't work

avatar
Super Collaborator

Got it working finally, thanks to @Robert Hryniewicz. Go to interpreter settings page and add the new property under livy settings - livy.spark.jars.packages and the value com.databricks:spark-csv_2.10:1.4.0. Restart the interpreter and retry the query.