Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to insert data into Hive from SparkSQL

avatar

Below is my code

 

import sqlContext.implicits._

import org.apache.spark.sql

 

val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")

val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")

val header = hospitalDataText.first()

val hospitalData = hospitalDataText.filter(a=>a!=header)

 

case class Services(uhid:String,locationid:String,doctorid:String)

 

val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))

val hosService = hData.toDF()

hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")

 

This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.

But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.

 

So i run below commands.

 

hosService.write.mode("overwrite").saveAsTable("hosData")

sqlContext.sql("show tables").show

 

shows me below result

 

+--------------------+-----------+

| tableName|isTemporary|

+--------------------+-----------+

| hosdata| false|

+--------------------+-----------+

 

But again when i check in hive, i can not see table 'hosdata'

Could anyone let me know what step i am missing?

1 ACCEPTED SOLUTION

avatar
Hi, that's because your spark program is not using the common hive metastore, so the definition of the table is gone when your application finishes.
You should create an external table in HIVE and then issue a refresh command, so after your spark application finishes, you will see new data in your table.
For creating external table see the Cloudera docs.

View solution in original post

5 REPLIES 5

avatar
Hi, that's because your spark program is not using the common hive metastore, so the definition of the table is gone when your application finishes.
You should create an external table in HIVE and then issue a refresh command, so after your spark application finishes, you will see new data in your table.
For creating external table see the Cloudera docs.

avatar

Thank you for your reply.

 

May i know what is referesh command ?

 

And can i see table in hive only after i close spark application?

avatar

Thank You. This works for me. 🙂

avatar
I am glad, so please accept it as a solution

avatar
Explorer

Another approach of inserting the data which we are following in our project is not to insert the data in HIVE directly from SPARK instead do the following.

 

1. Read the input csv file in SPARK and do the transformation of the data according to requirement.

2. Save the data back into an output csv file in HDFS

3. Push the data from the output csv into HIVE using HIVE -f or HIVE -e command from shell.