Support Questions

Find answers, ask questions, and share your expertise

Inserting data from a dataframe to an existing Hive Table- append mode

avatar
Explorer

Hi Everyone,

 

I have a basic question. While inserting data from a dataframe to an existing Hive Table.

 

I am using like in pySpark, which is always adding new data into table. (works fine as per requirement)

 

df.write.insertInto(table)

 but as per Spark docs, it's mentioned I should use command as 

df.write.mode("append").insertInto("table")

 Is it necessary to use mode("append") ?

4 REPLIES 4

avatar
Champion

@gaurav796

 

The difference is

 

 

insertInto: To overwrite any existing data

 

Mode comes with additional options, like 

 

mode("append"):  Append contents of this DataFrame to existing data
mode("overwrite:): Overwrite existing data.

 

Note: I didn't get a chance to explore this before reply

avatar
Explorer

Thank you for your response. but when I use only insertInto(table)... it always inserts new data into table.

Without deleting or overwriting anything. Which was strange for me. That's why I asked.

 

May be only using insertInto by default does append ??

avatar
New Contributor

i dont think so you can alter the existing table as the database is immutable 

avatar

To append data frames in R, use the rbind() function. The rbind() is a built-in R function that can combine several vectors, matrices, and/or data frames by rows.

 

When it comes to appending data frames, the rbind() and cbind() function comes to mind because they can concatenate the data frames horizontally and vertically. In this example, we will see how to use the rbind() function to append data frames.