Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Inserting data from a dataframe to an existing Hive Table- append mode

Inserting data from a dataframe to an existing Hive Table- append mode

New Contributor

Hi Everyone,

 

I have a basic question. While inserting data from a dataframe to an existing Hive Table.

 

I am using like in pySpark, which is always adding new data into table. (works fine as per requirement)

 

df.write.insertInto(table)

 but as per Spark docs, it's mentioned I should use command as 

df.write.mode("append").insertInto("table")

 Is it necessary to use mode("append") ?

3 REPLIES 3

Re: Inserting data from a dataframe to an existing Hive Table- append mode

Champion

@gaurav796

 

The difference is

 

 

insertInto: To overwrite any existing data

 

Mode comes with additional options, like 

 

mode("append"):  Append contents of this DataFrame to existing data
mode("overwrite:): Overwrite existing data.

 

Note: I didn't get a chance to explore this before reply

Re: Inserting data from a dataframe to an existing Hive Table- append mode

New Contributor

Thank you for your response. but when I use only insertInto(table)... it always inserts new data into table.

Without deleting or overwriting anything. Which was strange for me. That's why I asked.

 

May be only using insertInto by default does append ??

Highlighted

Re: Inserting data from a dataframe to an existing Hive Table- append mode

New Contributor

i dont think so you can alter the existing table as the database is immutable