Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Inserting data from a dataframe to an existing Hive Table- append mode

Highlighted

Inserting data from a dataframe to an existing Hive Table- append mode

New Contributor

Hi Everyone,

 

I have a basic question. While inserting data from a dataframe to an existing Hive Table.

 

I am using like in pySpark, which is always adding new data into table. (works fine as per requirement)

 

df.write.insertInto(table)

 but as per Spark docs, it's mentioned I should use command as 

df.write.mode("append").insertInto("table")

 Is it necessary to use mode("append") ?

3 REPLIES 3

Re: Inserting data from a dataframe to an existing Hive Table- append mode

Champion

@gaurav796

 

The difference is

 

 

insertInto: To overwrite any existing data

 

Mode comes with additional options, like 

 

mode("append"):  Append contents of this DataFrame to existing data
mode("overwrite:): Overwrite existing data.

 

Note: I didn't get a chance to explore this before reply

Re: Inserting data from a dataframe to an existing Hive Table- append mode

New Contributor

Thank you for your response. but when I use only insertInto(table)... it always inserts new data into table.

Without deleting or overwriting anything. Which was strange for me. That's why I asked.

 

May be only using insertInto by default does append ??

Re: Inserting data from a dataframe to an existing Hive Table- append mode

New Contributor

i dont think so you can alter the existing table as the database is immutable