Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Dynamic hive view creation

avatar
New Contributor

I have 10 hive tables, few columns might be similar in few tables, I need to union them into one table. I was writing Union all statements.

Now, these columns in these 10 tables might change . how to create the Union all query dynamically based on the tables.

------------------------------------------------

 

 

CREATE VIEW comb_view AS
SELECT col11, col12, col13,.....
From
(SELECT col11,col12,NULL col13 from tab1
UNION ALL
SELECT col11,NULL col12, col13 from tab2
.
.
.
.
) c

-------------------------------------------------------

ensuring the position and data type.

4 REPLIES 4

avatar
Master Collaborator

@lucid Do you mean, the position of the columns or the names of the columns change, or the data? As long as we are aware of the column names(data types should be consistent as well), we should be good to write our UNION statement. Please clarify if my understanding is wrong.

As you want to write the physical data into a table, you could consider using Hive Materialised view.

If your tables tab1, tab2 are ACID(transactional) tables, every time the data changes in those tables through any INSERT operation, the MV data also get refreshed.

avatar
New Contributor

What I mean by columns change is that table name will be same . New columns might be added or few existing columns might be removed. based on the columns available in the table at that point of time we need to create a VIEW with UNION ALL statement, with out manual intervention.

avatar
Master Collaborator

@lucid will the column changes be consistent across all the tables?

We could try "select * from <tbl1>", however it would require the schema of the table same in all the tables, else it will return semantic exception.

 

avatar
New Contributor

Yes, that is the issue. Schema across all the tables is not same. For that we are trying to fill in NULL while using SELECT statement on each table for those columns which are not present in that table. 

The changes for columns are not consistent accross tables. That is only few tables might update, few remain same.