08-06-2018 02:02 AM
We had a table, partition on date where table had column as varchar, 2 months before table has been changed and the column change to int.
Now for the specific date when trying to fetch data, its giving an error.
Kindly guide, how do i come out if this situation.
Please note data are fine through out table and have problem for some specific date.
where if i do select * from table between the date is giving me the output, but when i am trying to do count based on the changed column data type is throws below error.
Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.hive.serde2.io.HiveVarcharWritable
08-06-2018 06:05 AM
08-23-2018 06:48 AM
This is because some files we written with old schema and some files were written with the new schema. Depending how you query the data (if it is a select count(*) or select column) it may or may not fail. I dont know too much about ORC files, but in parquet files you can download the files to a local machine and via parquet tools examine the schema of each binary.
If you are able to read the whole table, I suggest to do
insert overwrite mytable select * from mytable;
Just to make sure, that every file is created with a correct schema