Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

save dataframe to a hive table

avatar
Expert Contributor

How can I save a dataframe in to a Hive table or sql table using scala.

scala> input.printSchema()
root
 |-- UID: decimal(38,10) (nullable = false)
 |-- DTIME: date (nullable = false)
 |-- TARGET: string (nullable = false)
 |-- RAVG: decimal(38,10) (nullable = true)
 |-- RMIN: decimal(38,10) (nullable = true)
 |-- RMAX: decimal(38,10) (nullable = true)
 |-- RSTD: decimal(38,10) (nullable = true)
 |-- SUCCESSES: decimal(38,10) (nullable = true)
 |-- FAILURES: decimal(38,10) (nullable = true)
 |-- LOCID: decimal(38,10) (nullable = true)
 |-- FNAME: string (nullable = true)
 |-- LD_DT: date (nullable = true)
 |-- RECDDELFLG: string (nullable = true)
 |-- DT_DIM_SEQ: decimal(38,10) (nullable = false)
9 REPLIES 9

avatar
Expert Contributor
import org.apache.spark.sql.hive.HiveContext;

HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());

df.write().mode(SaveMode.Overwrite).saveAsTable("dbName.tableName");

avatar
Expert Contributor

@Adnan Alvee


<console>:31: error: object HiveContext in package hive cannot be accessed in package org.apache.spark.sql.hive
         HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc())



avatar
Expert Contributor

Try this

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

avatar
Expert Contributor

I think you are in spark-shell. For your case, simply do the following.

input.write.mode("overwrite").saveAsTable("table1")

avatar
New Contributor

Hi,

I am using the above in saveasTable option pyspark and the file gets created under user/hive/warehouse. However the table is not reflected in hive.

df.write.mode("overwrite").saveAsTable("temp_d") leads to file creation in hdfs but no table in hive

Will hive auto infer the schema from dataframe or should we specify the schema in write?

Other option I tried, create a new table based on df=> select col1,col2 from table and then write it as a new table in hive

df.write.mode("append").saveAsTable("temp_d") leads to "No table exists error"

Is append not the correct option to save as a new table?

Kindly help!!

avatar
New Contributor

@testingsauce I am also facing this issue. Saved df in HIVE using saveAsTable but when i try to fetch results using hiveContext.sql(query), it doesn't return anything. BADLY stuck. Please help

avatar
New Contributor

Try creating a table in hive and then use df.write.mode("overwrite").saveAsTable("schema.table_name")

avatar

saveAsTable("my_table") is deprecated. Create new ampty table, insert and overwrite data into it.

avatar
New Contributor

"alisa houskova" - As per version 2.2.1, saveAsTable("") is not deprecated as shown in below screen shot, not sure which one did you refer.

63452-221.png