Appending to hive table gives an error but overwriting works why? error org.apache.spark.sql.AnalysisException: Inserting into an RDD-based table is not allowed


When i do

dfTrimmed.write.mode("overwrite").saveAsTable("table") it worksbut this dfTrimmed.write.mode("append").saveAsTable("table") gives an error 

error org.apache.spark.sql.AnalysisException: Inserting into an RDD-based table is not allowed

I am not sure why this is I am using spark 1.6.

I am inserting into a hive table while my dataframe was created through a hiveContext.

Thank you


Hive is not like a traditional RDBMS in regard to DML operations because of how Hive leverages HDFS to store data in files. Keep in mind that each partition has a file, each bucket adds another file and so on. When you perform a DML action against of a row, you practically overwrite a file, not append to a file. This is how HDFS has been architected for good reasons.

It should work like the following.

scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1")
scala> Seq((3, 4)).toDF("j", "i").write.mode("append").saveAsTable("t1")
scala> sql("select * from t1").show
|  i|  j|
|  1|  2|
|  4|  3|

According to your error message. The existing table `table` is not a hive table. Maybe, you created that table with that name by using `registerTempView` before. To create a table initially, use `saveAsTable` or `sql('CREATE TABLE...')` instead.