question Re: [Hive] table partitioned in parquet giving error that it stored in HiveFileFormat in Support Questions

[Hive] table partitioned in parquet giving error that it stored in HiveFileFormat

ditmarh — Tue, 04 Oct 2022 09:49:43 GMT

table is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`

I have this issue when I try to write using pyspark with the following command:

df.write.mode("append").format("parquet").saveAsTable("schema.table")

Before you say change from parquet to hive i know it works. But the thing is the table is partitioned in parquet and I really don't know why not its not working any more. It worked fine until now. The same command ran correctly for 1 month and 5 times so far. But today it does not want to write like this any more.

If i check the metadata it also points to everything being in parquet:

103	SerDe Library:	org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe	NULL
104	InputFormat:	org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat	NULL
105	OutputFormat:	org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

Re: [Hive] table partitioned in parquet giving error that it stored in HiveFileFormat

smruti — Tue, 18 Oct 2022 19:56:32 GMT

@ditmarh this might not work in scenarios where the table schema.table is created from Hive, and we are appending to it from Spark.

You may try the following command, replacing saveAsTable with insertInto.

df.write.mode("append").format("parquet").insertInto("schema.table")

Re: [Hive] table partitioned in parquet giving error that it stored in HiveFileFormat

ditmarh — Mon, 27 Feb 2023 15:29:42 GMT

Hi @smruti

Thanks for the reply. I forgot to post this but I also figured out that what you mentioned above is the actual problem the table was created in hive and as a result can not be modified by a spark instance.