Reply
New Contributor
Posts: 3
Registered: ‎11-15-2018
Accepted Solution

How to insert data into Hive from SparkSQL

Below is my code

 

import sqlContext.implicits._

import org.apache.spark.sql

 

val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")

val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")

val header = hospitalDataText.first()

val hospitalData = hospitalDataText.filter(a=>a!=header)

 

case class Services(uhid:String,locationid:String,doctorid:String)

 

val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))

val hosService = hData.toDF()

hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")

 

This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.

But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.

 

So i run below commands.

 

hosService.write.mode("overwrite").saveAsTable("hosData")

sqlContext.sql("show tables").show

 

shows me below result

 

+--------------------+-----------+

| tableName|isTemporary|

+--------------------+-----------+

| hosdata| false|

+--------------------+-----------+

 

But again when i check in hive, i can not see table 'hosdata'

Could anyone let me know what step i am missing?

Master
Posts: 402
Registered: ‎07-01-2015

Re: How to insert data into Hive from SparkSQL

Hi, that's because your spark program is not using the common hive metastore, so the definition of the table is gone when your application finishes.
You should create an external table in HIVE and then issue a refresh command, so after your spark application finishes, you will see new data in your table.
For creating external table see the Cloudera docs.
New Contributor
Posts: 3
Registered: ‎11-15-2018

Re: How to insert data into Hive from SparkSQL

Thank you for your reply.

 

May i know what is referesh command ?

 

And can i see table in hive only after i close spark application?

New Contributor
Posts: 3
Registered: ‎11-15-2018

Re: How to insert data into Hive from SparkSQL

Thank You. This works for me. :)

Highlighted
Master
Posts: 402
Registered: ‎07-01-2015

Re: How to insert data into Hive from SparkSQL

I am glad, so please accept it as a solution
Explorer
Posts: 8
Registered: ‎04-20-2016

Re: How to insert data into Hive from SparkSQL

Another approach of inserting the data which we are following in our project is not to insert the data in HIVE directly from SPARK instead do the following.

 

1. Read the input csv file in SPARK and do the transformation of the data according to requirement.

2. Save the data back into an output csv file in HDFS

3. Push the data from the output csv into HIVE using HIVE -f or HIVE -e command from shell.

Announcements