Support Questions

gaurav796 · ‎11-05-2017

Hi Everyone,

I have a basic question. While inserting data from a dataframe to an existing Hive Table.

I am using like in pySpark, which is always adding new data into table. (works fine as per requirement)

df.write.insertInto(table)

but as per Spark docs, it's mentioned I should use command as

df.write.mode("append").insertInto("table")

Is it necessary to use mode("append") ?

saranvisa · ‎11-06-2017

@gaurav796

The difference is

insertInto: To overwrite any existing data

Mode comes with additional options, like

mode("append"):  Append contents of this DataFrame to existing data
mode("overwrite:): Overwrite existing data.

Note: I didn't get a chance to explore this before reply

gaurav796 · ‎11-06-2017

Thank you for your response. but when I use only insertInto(table)... it always inserts new data into table.

Without deleting or overwriting anything. Which was strange for me. That's why I asked.

May be only using insertInto by default does append ??

Mrjack · ‎08-08-2018

i dont think so you can alter the existing table as the database is immutable

krunal_lathiya · ‎09-20-2021

To append data frames in R, use the rbind() function. The rbind() is a built-in R function that can combine several vectors, matrices, and/or data frames by rows.

When it comes to appending data frames, the rbind() and cbind() function comes to mind because they can concatenate the data frames horizontally and vertically. In this example, we will see how to use the rbind() function to append data frames.

Cloudera Community

Support Questions

Inserting data from a dataframe to an existing Hive Table- append mode