- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to fix Parquet schema: optional int64 amount [i:10 d:1 r:0]?
- Labels:
-
Apache Hive
-
Apache Impala
Created ‎08-04-2021 11:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a table and when I am trying to query it through imapala it throws me this error:
File 'hdfs:/path/table/1_data.0.parq' has an incompatible Parquet schema for column 'db.table.parameter_11'. Column type: STRING, Parquet schema: optional int64 amount [i:10 d:1 r:0]
But when I query it through hive it returns results correctly. How can I fix it?
Created ‎11-21-2023 05:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error message indicates that there is an inconsistency between the expected schema for the column 'db.table.parameter_11' and the actual schema found in the Parquet file 'hdfs:/path/table/1_data.0.parq'. The column type is expected to be a STRING, but the Parquet schema suggests that it is an optional int64 (integer) column.
To resolve this issue, you'll need to investigate and potentially correct the schema mismatch. Here are some steps you can take:
Verify the Expected Schema:
- Check the definition of the 'db.table.parameter_11' column in the Impala metadata or Hive metastore. Ensure that it is defined as a STRING type.
Inspect the Parquet File Schema:
You can use tools like parquet-tools to inspect the schema of the Parquet file directly. Run the following command in the terminal:
bash
- parquet-tools schema 1_data.0.parq
Look for the 'db.table.parameter_11' column and check its data type in the Parquet schema.
Compare Expected vs. Actual Schema:
- Compare the expected schema for 'db.table.parameter_11' with the actual schema found in the Parquet file. Identify any differences in data types.
Investigate Data Inconsistencies:
- If there are data inconsistencies, investigate how they might have occurred. It's possible that there was a schema evolution or a mismatch during the data writing process.
Resolve Schema Mismatch:
- Depending on your findings, you may need to correct the schema mismatch. This could involve updating the metadata in Impala or Hive to match the actual schema or adjusting the Parquet file schema.
Update Impala Statistics:
After resolving the schema mismatch, it's a good practice to update Impala statistics for the affected table. This can be done using the COMPUTE STATS command in Impala:
This step ensures that Impala has up-to-date statistics for query optimization.
Here's a high-level example of what the Parquet schema inspection might look like:
Look for the 'db.table.parameter_11' column and check its data type in the Parquet schema. If the data type in the Parquet schema is incorrect, you may need to investigate how the data was written and whether there were any issues during that process. Correcting the schema mismatch and updating Impala statistics should help resolve the issue.
