Reply
New Contributor
Posts: 3
Registered: ‎08-06-2018

Hive Data type conversation issue on partition table

We had a table, partition on date where table had column as varchar, 2 months before table has been changed and the column change to int.
Now for the specific date when trying to fetch data, its giving an error.
Kindly guide, how do i come out if this situation.
Please note data are fine through out table and have problem for some specific date.
where if i do select * from table between the date is giving me the output, but when i am trying  to do count based on the changed column data type is throws below error.

Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.hive.serde2.io.HiveVarcharWritable

New Contributor
Posts: 3
Registered: ‎08-06-2018

Re: Hive Data type conversation issue on partition table

I changed the table syntax from
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'

to
STORED AS ORC;
fixed the issue of count(*). but looking for explanation, if any one can help me out.
Master
Posts: 430
Registered: ‎07-01-2015

Re: Hive Data type conversation issue on partition table

This is because some files we written with old schema and some files were written with the new schema. Depending how you query the data (if it is a select count(*) or select column) it may or may not fail. I dont know too much about ORC files, but in parquet files you can download the files to a local machine and via parquet tools examine the schema of each binary.

 

If you are able to read the whole table, I suggest to do 

insert overwrite mytable select * from mytable;

Just to make sure, that every file is created with a correct schema