Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[Hive] table partitioned in parquet giving error that it stored in HiveFileFormat

avatar
New Contributor

table is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`

 

I have this issue when I try to write using pyspark with the following command:

df.write.mode("append").format("parquet").saveAsTable("schema.table")

 

Before you say change from parquet to hive i know it works. But the thing is the table is partitioned in parquet and I really don't know why not its not working any more. It worked fine until now. The same command ran correctly for 1 month and 5 times so far. But today it does not want to write like this any more.

 

If i check the metadata it also points to everything being in parquet:

``

103SerDe Library:      org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDeNULL
104InputFormat:        org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormatNULL
105OutputFormat:       org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

``

1 ACCEPTED SOLUTION

avatar
Master Collaborator

@ditmarh this might not work in scenarios where the table schema.table is created from Hive, and we are appending to it from Spark. 

 

You may try the following command, replacing saveAsTable with insertInto.

df.write.mode("append").format("parquet").insertInto("schema.table")

View solution in original post

2 REPLIES 2

avatar
Master Collaborator

@ditmarh this might not work in scenarios where the table schema.table is created from Hive, and we are appending to it from Spark. 

 

You may try the following command, replacing saveAsTable with insertInto.

df.write.mode("append").format("parquet").insertInto("schema.table")

avatar
New Contributor

Hi @smruti 

Thanks for the reply. I forgot to post this but I also figured out that what you mentioned above is the actual problem the table was created in hive and as a result can not be modified by a spark instance.