We are having an issue reading tables in Impala with datatype Decimal (1,1). The error is:
column 'DECIMAL_1_1' has an invalid type length. Expecting: 1 len in file 4
We have two Cloudera clusters (CDH 6.3.3). This error only occurs in the 2nd cluster.
We write data into Parquet files through IBM Datastage.
I run the same job in both clusters to examine the parquet files schema using parquet-tools schema , and I found this:
In the 1st cluster, in the parquet file the schema of the column causing the issue is:
optional int32 DECIMAL_1_1 (DECIMAL(1,1));
However, in the 2nd cluster:
optional fixed_len_byte_array(4) DECIMAL_1_1 (DECIMAL(1,1));
We need to understand why the type is fixed_len_byte_array(4) in this parquet while it's int32 in the other. Apparently the fixed_len_byte_array(4) is causing the issue with Impala.
@MeshalYou are hitting the below bug.
Let me know if you have any concerns.
I have checked that page already while I was looking for a solution. But what I would like to understand why the 2 parquet files have different types for the DECIMAL(1,1), although both are created using the same job but in another cluster.
@Meshal Thanks for your response.
please check how the table was created in both clusters? Run show create table and compare the output.
Solution:you can get a patch for this IMPALA-7087 in CDH 6.3.3
@Meshal Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks