Support Questions
Find answers, ask questions, and share your expertise

Spark Parquet Issue with Delta Updates

New Contributor


I'm ingesting data from JDBC source and creating Parquet files and queries them via Hive External table.

When I do full table load, then Hive queries works and  and then on next day onwards i do delta updates(select range) , then Hive queries fails.

I checked Parquet schema via spark-shell , Parquet file with full table's data has Filter field as decimal(8,0)(as in SQL schema), but Parquet file which has range data(data for 1 day) changed data type of Filter field to 'integer'.

That's why Hive queries are failing , because it finds conflicting schema, but originally there is no schema changes.


Spark version : 2.3.2

Any experiences/pointers why parquet APIs are changing data-type.