- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Not able to create SparkR dataframe using read.df
- Labels:
-
Apache Spark
-
Apache Zeppelin
Created ‎11-28-2016 11:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using Hortonworks Sandbox, I am setting up SparkR in both RStudio and Zeppelin. This below code works properly in RStudio and SparkR shell but not in Zeppelin, please have a look:
if (nchar(Sys.getenv("SPARK_HOME")) < 1) { Sys.setenv(SPARK_HOME = "/usr/hdp/2.5.0.0-1245/spark") } library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sc <- sparkR.init(master = "local[*]", sparkEnvir = list(spark.driver.memory="2g"),sparkPackages="com.databricks:spark-csv_2.10:1.4.0") sqlContext <- sparkRSQL.init(sc) train_df <- read.df(sqlContext,"/tmp/first_8.csv","csv", header = "true", inferSchema = "true")
But when I do this in Zeppelin using livy.spark interpreter, I get ClassNotFound Exception:
java.lang.ClassNotFoundException: Failed to find data source: csv. Please find packages at http://spark-packages.org
I am also importing the dependencies using dep interpreter -
%dep z.reset() z.load("com.databricks:spark-csv_2.10:1.4.0")
But this seems to make no impact I guess. I have also tried manually copying spark-csv_2.10-1.4.0.jar to /usr/hdp/2.5.0.0-1245/spark/lib, but it is not working. Has anyone experienced this before? Thanks in advance
Created ‎12-06-2016 10:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Got it working finally, thanks to @Robert Hryniewicz. Go to interpreter settings page and add the new property under livy settings - livy.spark.jars.packages and the value com.databricks:spark-csv_2.10:1.4.0. Restart the interpreter and retry the query.
Created ‎12-05-2016 04:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please specify com.databricks:spark-csv_2.10:1.4.0 in the interpreter setting page
Created ‎12-05-2016 07:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jzhang,should I add it in the livy interpreter?
Created ‎12-06-2016 08:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried that, but it didn't work
Created ‎12-06-2016 10:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Got it working finally, thanks to @Robert Hryniewicz. Go to interpreter settings page and add the new property under livy settings - livy.spark.jars.packages and the value com.databricks:spark-csv_2.10:1.4.0. Restart the interpreter and retry the query.
