Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to add a column in the middle of a ORC partitioned hive table and still be able to query old partitioned files with new structure ?

avatar
New Contributor

Currently I have a Partitioned ORC "Managed" (Wrongly created as Internal first) Hive table in Prod with atleast 100 days worth of data partitioned by year,month,day(~16GB of data).

This table has roughly 160 columns.Now my requirement is to Add a column in the middle of this table and still be able to query the older data(partitioned files).Its is fine if the newly added column shows null for the old data.

See below What I did so far ?

1)First convert the table to External using below to preserve data files before dropping

alter table <table_name> SET TBLPROPERTIES('EXTERNAL'='TRUE');

2)Drop and Recreate the table with new column in the middle and then Altered the table to add the old partition file

However I am unable to read the table after Recreation .I get this Error message

Any other ways to Accomplish this ?

[Simba][HiveJDBCDriver](500312) Error in fetching data rows: *org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.io.IOException: ORC does not support type conversion from file type array<string> (87) to reader type int (87):33:32;
3 REPLIES 3

avatar
New Contributor

@Neeraj Sabharwal :Any suggestions here ?

avatar
Explorer

@hippagun : Below alterr will help for adding column to a partitioned table.

ALTER TABLE default.test_table ADD columns (column1 string,column2 string) CASCADE;

avatar

Hi @hippagun

It wont work. though its ORC hive will be able to differentiate the columns based on the delimiter which you have specified during the table creation. So no matter whether you re-create it it wont work. There are two option which you can do now:

1) Create another external table with the additional columns. Write a simple query to load the records from old to the new table specifying null the newly added columns. Once it is done drop the old table. Going forward you can make use of this table. It will be suitable for ORC

2) The other way is, If the schema of the table changes frequently then its better to go with avro table as the schema changes can be handled easily. You have to follow the above step just for the first time. But whenever there is any changes in the schema in future then you need to alter the schema file and nothing else is needed.

You can refer to this Link to get the understanding of the handling the schema changes in avro file.

Hope it helps!!