Created on 06-28-2017 01:27 AM - edited 09-16-2022 04:50 AM
I am new to Spark. I learnt how to load a hive program from Spark-shell. I tried to do the same from eclipse and here is the program I have written.
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SaveMode object SuperSpark { case class partclass(id:Int, name:String, salary:Int, dept:String, location:String) def main(argds: Array[String]) { val warehouseLocation = "file:${system:user.dir}/spark-warehouse" val sparkSession = SparkSession.builder.master("local[2]").appName("Saving data into HiveTable using Spark") .enableHiveSupport() .config("hive.exec.dynamic.partition", "true") .config("hive.exec.dynamic.partition.mode", "nonstrict") .config("hive.metastore.warehouse.dir", "/user/hive/warehouse") .config("spark.sql.warehouse.dir", warehouseLocation) .getOrCreate() import sparkSession.implicits._ val partfile = sparkSession.read.textFile("partfile") val partdata = partfile.map(p => p.split(",")) val partRDD = partdata.map(line => partclass(line(0).toInt, line(1), line(2).toInt, line(3), line(4))) val partDF = partRDD.toDF() partDF.write.mode(SaveMode.Append).insertInto("parttab") } }
What I don't understand now is how to execute this program ? I'm stuck at these points.
Created 06-28-2017 06:53 AM
Please refer this link .
this should give you a kick start .
if you need any more details let me know
Created 06-28-2017 08:11 PM
The code you posted is already inserting the data in the file into the table parttab. To change the db you could use 'partDF.sql("use newdb")'
Yes you should configure a run time for Spark and run it in ecplise. After that runs without errors then build it and upload it to the cluster, and use spark-submit to run it on the cluster.
Created 06-28-2017 10:33 PM
Regarding the connection details,
Created 06-28-2017 10:40 PM