Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Parquet Issue with Delta Updates

Spark Parquet Issue with Delta Updates

New Contributor

Hello,

I'm ingesting data from JDBC source and creating Parquet files and queries them via Hive External table.

When I do full table load, then Hive queries works and  and then on next day onwards i do delta updates(select range) , then Hive queries fails.

I checked Parquet schema via spark-shell , Parquet file with full table's data has Filter field as decimal(8,0)(as in SQL schema), but Parquet file which has range data(data for 1 day) changed data type of Filter field to 'integer'.

That's why Hive queries are failing , because it finds conflicting schema, but originally there is no schema changes.

 

Spark version : 2.3.2

Any experiences/pointers why parquet APIs are changing data-type. 

 

//Dhiraj

 

Don't have an account?
Coming from Hortonworks? Activate your account here