Scenario: Trying to add new columns to an already partitioned Hive table.
Problem: The newly added columns will show up as null values on the data present in existing partitions.
Solution:
One of the workaround can be copying/moving the data in a temporary location,dropping the partition, adding back the data and then adding back the partition. It works and the new column picks up the values.
But for big tables this is not a viable solution.
Best approach:
We need to construct the alter statement to add columns with CASCADE option as follows:
ALTER TABLE default.test_table ADD columns (column1 string,column2 string) CASCADE;
From the Hive documentation:
“ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column change only to table metadata.”
I found out that this option is not at all in wide use and can help who face this situation.