Support Questions

ditmarh · ‎10-04-2022

table is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`

I have this issue when I try to write using pyspark with the following command:

df.write.mode("append").format("parquet").saveAsTable("schema.table")

Before you say change from parquet to hive i know it works. But the thing is the table is partitioned in parquet and I really don't know why not its not working any more. It worked fine until now. The same command ran correctly for 1 month and 5 times so far. But today it does not want to write like this any more.

If i check the metadata it also points to everything being in parquet:

``

103	SerDe Library:	org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe	NULL
104	InputFormat:	org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat	NULL
105	OutputFormat:	org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

``

smruti · ‎10-18-2022

@ditmarh this might not work in scenarios where the table schema.table is created from Hive, and we are appending to it from Spark.

You may try the following command, replacing saveAsTable with insertInto.

df.write.mode("append").format("parquet").insertInto("schema.table")

View solution in original post

smruti · ‎10-18-2022

@ditmarh this might not work in scenarios where the table schema.table is created from Hive, and we are appending to it from Spark.

You may try the following command, replacing saveAsTable with insertInto.

df.write.mode("append").format("parquet").insertInto("schema.table")

ditmarh · ‎02-27-2023

Hi @smruti

Thanks for the reply. I forgot to post this but I also figured out that what you mentioned above is the actual problem the table was created in hive and as a result can not be modified by a spark instance.

Cloudera Community

Support Questions

[Hive] table partitioned in parquet giving error that it stored in HiveFileFormat