The error message indicates that there is an inconsistency between the expected schema for the column 'db.table.parameter_11' and the actual schema found in the Parquet file 'hdfs:/path/table/1_data.0.parq'. The column type is expected to be a STRING, but the Parquet schema suggests that it is an optional int64 (integer) column.
To resolve this issue, you'll need to investigate and potentially correct the schema mismatch. Here are some steps you can take:
Verify the Expected Schema:
- Check the definition of the 'db.table.parameter_11' column in the Impala metadata or Hive metastore. Ensure that it is defined as a STRING type.
Inspect the Parquet File Schema:
Compare Expected vs. Actual Schema:
- Compare the expected schema for 'db.table.parameter_11' with the actual schema found in the Parquet file. Identify any differences in data types.
Investigate Data Inconsistencies:
- If there are data inconsistencies, investigate how they might have occurred. It's possible that there was a schema evolution or a mismatch during the data writing process.
Resolve Schema Mismatch:
- Depending on your findings, you may need to correct the schema mismatch. This could involve updating the metadata in Impala or Hive to match the actual schema or adjusting the Parquet file schema.
Update Impala Statistics:
Here's a high-level example of what the Parquet schema inspection might look like:
parquet-tools schema 1_data.0.parq
Look for the 'db.table.parameter_11' column and check its data type in the Parquet schema. If the data type in the Parquet schema is incorrect, you may need to investigate how the data was written and whether there were any issues during that process. Correcting the schema mismatch and updating Impala statistics should help resolve the issue.