Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Data type conversation issue on partition table

Hive Data type conversation issue on partition table

New Contributor

We had a table, partition on date where table had column as varchar, 2 months before table has been changed and the column change to int.
Now for the specific date when trying to fetch data, its giving an error.
Kindly guide, how do i come out if this situation.
Please note data are fine through out table and have problem for some specific date.
where if i do select * from table between the date is giving me the output, but when i am trying  to do count based on the changed column data type is throws below error.

Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.hive.serde2.io.HiveVarcharWritable

2 REPLIES 2

Re: Hive Data type conversation issue on partition table

New Contributor
I changed the table syntax from
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'

to
STORED AS ORC;
fixed the issue of count(*). but looking for explanation, if any one can help me out.

Re: Hive Data type conversation issue on partition table

Master Collaborator

This is because some files we written with old schema and some files were written with the new schema. Depending how you query the data (if it is a select count(*) or select column) it may or may not fail. I dont know too much about ORC files, but in parquet files you can download the files to a local machine and via parquet tools examine the schema of each binary.

 

If you are able to read the whole table, I suggest to do 

insert overwrite mytable select * from mytable;

Just to make sure, that every file is created with a correct schema