Created 03-02-2018 02:41 PM
Hi All, i am processing kafka data by spark and push in to hive tables while insert into table face an issue in warehouse location it create new part file for every insert command please share some solution to avoid that problem for single select statement it will take more than 30 min.
import spark.implicits._ // Every time get new data by kafka consumer. assing to jsonStr string. val jsonStr ="""{"b_s_isehp" : "false","event_id" : "4.0","l_bsid" : "88.0"}""" val df = spark.read.json(Seq(jsonStr).toDS) df.coalesce(1).write.mode("append").insertInto("tablename")
Created 03-06-2018 01:27 AM
Hi @yogesh turkane,
As I was across, We can achieve this with two ways.
The code snippet would be
val tDf = hiveContext.table("table_name") tdf.rePartition(<num_Files>).write.mode("overwrite").saveAsTable("targetDB.targetTbale")
the second option will work with any type of files.
Hope this helps !!