04-02-2018 03:45 AM - last edited on 04-02-2018 06:59 AM by cjervis
I am new to Cloudera Quickstart VM 5.12.
I am trying to run small program which is to create Database / Table and load data using Spark SQL.
I wrote sample code - but it is giving hive error. To fix that I tried to copy hive-conf.xml to spark/conf directory and even for hard link - but it's failed with a permission denied.
Can anyone help me... to fix this. It would be great help for me.
As this VM is with Spark 1.6.1, can somebody give sample code for this. As of now, I am running in intellij.
04-11-2018 05:45 AM
Since you are using Spark1.6 all you'd need is a hive gateway to explore hive tables from spark sql (no need to manually transport hive-site.xml).
You can add/ensure that the Hive gateway is added to the node from where you are running the spark-shell (in your case there is just one node so it should be your quickstart VM) using CM > Hive > Instances > Gateway Role
As for your requirement of a sample code, you can start by creating a sequence or an array from the shell
scala> val data = Seq(("Falcon", 10), ("IronMan", 40), ("BlackWidow", 10))
Next, parallelize the collection and create a DataFrame from the RDD
scala> val df = sc.parallelize(data).toDF("Name", "Count")
After this set the Hive warehouse path
scala> val options = Map("path" -> "/user/hive/warehouse/avengers")
Followed by saving the table
Finally, query the table using Spark SQL and beeline
scala> sqlContext.sql("select * from avengers").collect.foreach(println); [Falcon, 30] [IronMan, 40] [BlackWidow, 10]
$ beeline … > show tables; > select * from avengers;
Falcon 30 IronMan 40 BlackWidow 10
Hope this helps. Let us know if you already got past it and/or if you are still stuck.