Created 10-04-2022 02:49 AM
table is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`
I have this issue when I try to write using pyspark with the following command:
df.write.mode("append").format("parquet").saveAsTable("schema.table")
Before you say change from parquet to hive i know it works. But the thing is the table is partitioned in parquet and I really don't know why not its not working any more. It worked fine until now. The same command ran correctly for 1 month and 5 times so far. But today it does not want to write like this any more.
If i check the metadata it also points to everything being in parquet:
``
103 | SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL |
104 | InputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | NULL |
105 | OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat |
``
Created 10-18-2022 12:56 PM
@ditmarh this might not work in scenarios where the table schema.table is created from Hive, and we are appending to it from Spark.
You may try the following command, replacing saveAsTable with insertInto.
df.write.mode("append").format("parquet").insertInto("schema.table")
Created 10-18-2022 12:56 PM
@ditmarh this might not work in scenarios where the table schema.table is created from Hive, and we are appending to it from Spark.
You may try the following command, replacing saveAsTable with insertInto.
df.write.mode("append").format("parquet").insertInto("schema.table")
Created 02-27-2023 07:29 AM
Hi @smruti
Thanks for the reply. I forgot to post this but I also figured out that what you mentioned above is the actual problem the table was created in hive and as a result can not be modified by a spark instance.