11-05-2017 07:57 AM
I have a basic question. While inserting data from a dataframe to an existing Hive Table.
I am using like in pySpark, which is always adding new data into table. (works fine as per requirement)
but as per Spark docs, it's mentioned I should use command as
Is it necessary to use mode("append") ?
11-06-2017 07:38 AM - edited 11-06-2017 07:40 AM
The difference is
insertInto: To overwrite any existing data
Mode comes with additional options, like
mode("append"): Append contents of this DataFrame to existing data mode("overwrite:): Overwrite existing data.
Note: I didn't get a chance to explore this before reply
11-06-2017 04:05 PM
Thank you for your response. but when I use only insertInto(table)... it always inserts new data into table.
Without deleting or overwriting anything. Which was strange for me. That's why I asked.
May be only using insertInto by default does append ??