Created on 11-06-2017 02:31 AM - edited 09-16-2022 05:29 AM
Impala version
Shell version: Impala Shell v2.8.0-cdh5.11.2 (f89269c) built on Fri Aug 18 14:04:44 PDT 2017
Server version: impalad version 2.8.0-cdh5.11.2 RELEASE (build f89269c4b96da14a841e94bdf6d4d48821b0d65
Deployment Details
3x data nodes
1x name node
LDAP authentication
Mysql meta store
Steps to reproduce
1) Create a table using create table command
create TABLE t1 (x INT, y STRING)
2) insert some data
INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
3) Create a view as select * from table
create view v1 as select * from t1;
4) Descirbe view v1
describe v1;
+------+--------+---------+
| name | type | comment |
+------+--------+---------+
| x | int | |
| y | string | |
+------+--------+---------+
5) select * from view
select * from v1;
+---+-------+
| x | y |
+---+-------+
| 1 | one |
| 2 | two |
| 3 | three |
+---+-------+
6) Add additional columns to table
alter table t1 add columns (c string);
7) Descibe view v1 and see that new column c is not reflected in the output.
describe v1
+------+--------+---------+
| name | type | comment |
+------+--------+---------+
| x | int | |
| y | string | |
+------+--------+---------+
😎 select * from view to see new column c.
select * from v1
+---+-------+------+
| x | y | c |
+---+-------+------+
| 1 | one | NULL |
| 2 | two | NULL |
| 3 | three | NULL |
+---+-------+------+
Actual Behaviour
If columns are added to an impala table after the view has been created, the new columns are selectable from the view, but the new columns are not presented in the output of descirbe view.
I have tried to refresh the view, but still the describe `viewname` does not reflect the newly added columns.
Expected Behaviour
Describe view is used by connecting applications to retireve the schema information, if it is not updated with the new columns, the connecting applications are not aware of the new additional columns present in the underlying tables which makes them beleive they new columns are not avaiable, althought select * does display them. The columns returned by the select * statement and descirbe views should be consistent.
User statement
As a user of impala when connecting with a clietn application all of the columns returns by the sql statment that makes up the view should be reflected in the describe statement. A select * from view should return the same columns as describe view.
Acceptance Criteria
A select * from view and the columns returned by describe should be consistent and any additional columns added to the table that should be returned by the view, should be reflected in both the select * and the describe view statement.
Created 11-08-2017 08:11 PM
Created 11-13-2017 02:02 AM
Hi Eric,
Thanks so much for your comments.
We have no fixed schema as data fields are added/removed on demand in our application, which in turn automatically adds new columns to the corresponding tables in impala.
The views have pretty complex SQL Views that are created on demand by the users of the application, so we never know the exact views and what tables they are querying.
So I assume the alter view would have to include the full complex SQL statement along the following procedure:
On adding/removing a column to an impala table
1) Run show table to get all the tables
2) Run describe view formatted over each table to see which are views and extract the corresponding complex SQL.
3) For each view, itterate over the complex SQL and determine if they it is selecting from any of the updated tables
4) For each view that needs updating run alter view for each view with the original complex SQL extracted in step 2.
We generally add/remove columns every few minutes, and it seem that performing the above steps across a large number (1000+) of user created views for each additional datafield update seems quite an overhead, is there another way that I might have missed?
Created 11-14-2017 02:33 AM
Created 11-14-2017 03:32 AM
Hi Eric,
Thanks once again, by updating the view I assume you mean running 'alter view' with the same complex SQL statement via the following process:
1) Run show table to get all the tables
2) Run describe view formatted over each table to see which are views and extract the corresponding complex SQL.
3) For each view, itterate over the complex SQL and determine if they it is selecting from any of the updated tables
4) For each view that needs updating run alter view for each view with the original complex SQL extracted in step 2.
I can code this up, but was wondering if there is another way.