Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Appending to hive table gives an error but overwriting works why? error org.apache.spark.sql.AnalysisException: Inserting into an RDD-based table is not allowed

avatar

When i do

dfTrimmed.write.mode("overwrite").saveAsTable("table") it worksbut this dfTrimmed.write.mode("append").saveAsTable("table") gives an error 

error org.apache.spark.sql.AnalysisException: Inserting into an RDD-based table is not allowed

I am not sure why this is I am using spark 1.6.

I am inserting into a hive table while my dataframe was created through a hiveContext.

Thank you

3 REPLIES 3

avatar

@elliot gimple

Hive is not like a traditional RDBMS in regard to DML operations because of how Hive leverages HDFS to store data in files. Keep in mind that each partition has a file, each bucket adds another file and so on. When you perform a DML action against of a row, you practically overwrite a file, not append to a file. This is how HDFS has been architected for good reasons.

avatar
Rising Star

avatar
Expert Contributor

It should work like the following.

scala> Seq((1, 2)).toDF("i", "j").write.mode("overwrite").saveAsTable("t1")
scala> Seq((3, 4)).toDF("j", "i").write.mode("append").saveAsTable("t1")
scala> sql("select * from t1").show
+---+---+
|  i|  j|
+---+---+
|  1|  2|
|  4|  3|
+---+---+

According to your error message. The existing table `table` is not a hive table. Maybe, you created that table with that name by using `registerTempView` before. To create a table initially, use `saveAsTable` or `sql('CREATE TABLE...')` instead.

Labels