Created on 11-05-2017 07:57 AM - edited 09-16-2022 05:29 AM
Hi Everyone,
I have a basic question. While inserting data from a dataframe to an existing Hive Table.
I am using like in pySpark, which is always adding new data into table. (works fine as per requirement)
df.write.insertInto(table)
but as per Spark docs, it's mentioned I should use command as
df.write.mode("append").insertInto("table")
Is it necessary to use mode("append") ?
Created on 11-06-2017 07:38 AM - edited 11-06-2017 07:40 AM
The difference is
insertInto: To overwrite any existing data
Mode comes with additional options, like
mode("append"): Append contents of this DataFrame to existing data mode("overwrite:): Overwrite existing data.
Note: I didn't get a chance to explore this before reply
Created 11-06-2017 04:05 PM
Thank you for your response. but when I use only insertInto(table)... it always inserts new data into table.
Without deleting or overwriting anything. Which was strange for me. That's why I asked.
May be only using insertInto by default does append ??
Created 08-08-2018 11:35 PM
i dont think so you can alter the existing table as the database is immutable
Created 09-20-2021 04:30 AM
To append data frames in R, use the rbind() function. The rbind() is a built-in R function that can combine several vectors, matrices, and/or data frames by rows.
When it comes to appending data frames, the rbind() and cbind() function comes to mind because they can concatenate the data frames horizontally and vertically. In this example, we will see how to use the rbind() function to append data frames.