Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to insert data into Hive from SparkSQL

Solved Go to solution

How to insert data into Hive from SparkSQL

New Contributor

Below is my code

 

import sqlContext.implicits._

import org.apache.spark.sql

 

val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")

val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")

val header = hospitalDataText.first()

val hospitalData = hospitalDataText.filter(a=>a!=header)

 

case class Services(uhid:String,locationid:String,doctorid:String)

 

val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))

val hosService = hData.toDF()

hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")

 

This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.

But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.

 

So i run below commands.

 

hosService.write.mode("overwrite").saveAsTable("hosData")

sqlContext.sql("show tables").show

 

shows me below result

 

+--------------------+-----------+

| tableName|isTemporary|

+--------------------+-----------+

| hosdata| false|

+--------------------+-----------+

 

But again when i check in hive, i can not see table 'hosdata'

Could anyone let me know what step i am missing?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to insert data into Hive from SparkSQL

Master Collaborator
Hi, that's because your spark program is not using the common hive metastore, so the definition of the table is gone when your application finishes.
You should create an external table in HIVE and then issue a refresh command, so after your spark application finishes, you will see new data in your table.
For creating external table see the Cloudera docs.
5 REPLIES 5
Highlighted

Re: How to insert data into Hive from SparkSQL

Master Collaborator
Hi, that's because your spark program is not using the common hive metastore, so the definition of the table is gone when your application finishes.
You should create an external table in HIVE and then issue a refresh command, so after your spark application finishes, you will see new data in your table.
For creating external table see the Cloudera docs.

Re: How to insert data into Hive from SparkSQL

New Contributor

Thank you for your reply.

 

May i know what is referesh command ?

 

And can i see table in hive only after i close spark application?

Re: How to insert data into Hive from SparkSQL

New Contributor

Thank You. This works for me. :)

Re: How to insert data into Hive from SparkSQL

Master Collaborator
I am glad, so please accept it as a solution

Re: How to insert data into Hive from SparkSQL

Explorer

Another approach of inserting the data which we are following in our project is not to insert the data in HIVE directly from SPARK instead do the following.

 

1. Read the input csv file in SPARK and do the transformation of the data according to requirement.

2. Save the data back into an output csv file in HDFS

3. Push the data from the output csv into HIVE using HIVE -f or HIVE -e command from shell.

Don't have an account?
Coming from Hortonworks? Activate your account here