Support Questions

KedarDivekar · ‎11-16-2018

Below is my code

import sqlContext.implicits._

import org.apache.spark.sql

val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")

val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")

val header = hospitalDataText.first()

val hospitalData = hospitalDataText.filter(a=>a!=header)

case class Services(uhid:String,locationid:String,doctorid:String)

val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))

val hosService = hData.toDF()

hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")

This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.

But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.

So i run below commands.

hosService.write.mode("overwrite").saveAsTable("hosData")

sqlContext.sql("show tables").show

shows me below result

+--------------------+-----------+

| tableName|isTemporary|

+--------------------+-----------+

| hosdata| false|

+--------------------+-----------+

But again when i check in hive, i can not see table 'hosdata'

Could anyone let me know what step i am missing?

Tomas79 · ‎11-20-2018

Hi, that's because your spark program is not using the common hive metastore, so the definition of the table is gone when your application finishes.
You should create an external table in HIVE and then issue a refresh command, so after your spark application finishes, you will see new data in your table.
For creating external table see the Cloudera docs.

View solution in original post

Tomas79 · ‎11-20-2018

Hi, that's because your spark program is not using the common hive metastore, so the definition of the table is gone when your application finishes.
You should create an external table in HIVE and then issue a refresh command, so after your spark application finishes, you will see new data in your table.
For creating external table see the Cloudera docs.

KedarDivekar · ‎11-20-2018

Thank you for your reply.

May i know what is referesh command ?

And can i see table in hive only after i close spark application?

KedarDivekar · ‎11-21-2018

Thank You. This works for me. 🙂

Tomas79 · ‎11-21-2018

I am glad, so please accept it as a solution

Aditya · ‎01-26-2019

Another approach of inserting the data which we are following in our project is not to insert the data in HIVE directly from SPARK instead do the following.

1. Read the input csv file in SPARK and do the transformation of the data according to requirement.

2. Save the data back into an output csv file in HDFS

3. Push the data from the output csv into HIVE using HIVE -f or HIVE -e command from shell.

Cloudera Community

Support Questions

How to insert data into Hive from SparkSQL

Data not inserting in hive table (CDP)

SparkSQL and ESRI Geospatial UDFs for Hive

HBase stores base64 data when data is inserted fro...

Visualizing Hive Data Using Microsoft Power BI

Job hang when Insert data into table in Spark Thri...

Spark2 save insert data to Hive with snappy compre...

Adding Hive SerDe jar on SparkSQL Thrift Server

issue while inserting data into hive through spark

Hive Insert Query

hive Insert to Dynamic Partition query Generating ...