Reply
Explorer
Posts: 11
Registered: ‎06-21-2017

describe view not updated with additional columns when added to underlying tables but are selectable

Impala version

Shell version: Impala Shell v2.8.0-cdh5.11.2 (f89269c) built on Fri Aug 18 14:04:44 PDT 2017

Server version: impalad version 2.8.0-cdh5.11.2 RELEASE (build f89269c4b96da14a841e94bdf6d4d48821b0d65

 

Deployment Details

3x data nodes

1x name node

LDAP authentication

Mysql meta store

 

Steps to reproduce

1) Create a table using create table command

create TABLE t1 (x INT, y STRING)

2) insert some data

INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');

3) Create a view as select * from table

create view v1 as select * from t1;

4) Descirbe view v1

describe v1;

+------+--------+---------+

| name | type   | comment |

+------+--------+---------+

| x    | int    |         |

| y    | string |         |

+------+--------+---------+

5) select * from view

select * from v1;

+---+-------+

| x | y     |

+---+-------+

| 1 | one   |

| 2 | two   |

| 3 | three |

+---+-------+

6) Add additional columns to table 

alter table t1 add columns (c string);

7) Descibe view v1 and see that new column c is not reflected in the output.

describe v1

+------+--------+---------+

| name | type   | comment |

+------+--------+---------+

| x    | int    |         |

| y    | string |         |

+------+--------+---------+

8) select * from view to see new column c.

select * from v1

+---+-------+------+

| x | y     | c    |

+---+-------+------+

| 1 | one   | NULL |

| 2 | two   | NULL |

| 3 | three | NULL |

+---+-------+------+

 

 

Actual Behaviour

If columns are added to an impala table after the view has been created, the new columns are selectable from the view, but the new columns are not presented in the output of descirbe view.

 

I have tried to refresh the view, but still the describe `viewname` does not reflect the newly added columns.

 

Expected Behaviour

Describe view is used by connecting applications to retireve the schema information, if it is not updated with the new columns, the connecting applications are not aware of the new additional columns present in the underlying tables which makes them beleive they new columns are not avaiable, althought select * does display them. The columns returned by the select * statement and descirbe views should be consistent.

 

User statement

As a user of impala when connecting with a clietn application all of the columns returns by the sql statment that makes up the view should be reflected in the describe statement. A select * from view should return the same columns as describe view.

 

Acceptance Criteria

A select * from view and the columns returned by describe should be consistent and any additional columns added to the table that should be returned by the view, should be reflected in both the select * and the describe view statement.

 

 

 

Cloudera Employee
Posts: 212
Registered: ‎03-23-2015

Re: describe view not updated with additional columns when added to underlying tables but are select

You will need to update the VIEW:

ALTER VIEW v1 AS SELECT * FROM t1;

The columns for the VIEW is stored separately as the table's columns, updating the table will not affect the VIEW. You will need to manually update it.
Explorer
Posts: 11
Registered: ‎06-21-2017

Re: describe view not updated with additional columns when added to underlying tables but are select

Hi Eric,

 

 

Thanks so much for your comments.

 

We have no fixed schema as data fields are added/removed on demand in our application, which in turn automatically adds new columns to the corresponding tables in impala.

 

The views have pretty complex SQL Views that are created on demand by the users of the application, so we never know the exact views and what tables they are querying.

 

So I assume the alter view would have to include the full complex SQL statement along the following procedure:

 

On adding/removing a column to an impala table

1) Run show table to get all the tables

2) Run describe view formatted over each table to see which are views and extract the corresponding complex SQL.

3) For each view, itterate over the complex SQL and determine if they it is selecting from any of the updated tables

4) For each view that needs updating run alter view for each view with the original complex SQL extracted in step 2.

 

We generally add/remove columns every few minutes, and it seem that performing the above steps across a large number (1000+) of user created views for each additional datafield update seems quite an overhead, is there another way that I might have missed?

 

Cloudera Employee
Posts: 212
Registered: ‎03-23-2015

Re: describe view not updated with additional columns when added to underlying tables but are select

Hi,

Are you able to update the view while you updating the table with new columns? I don't see there is other better ways as VIEWs and TABLEs are different entities in Hive and they don't relate to each other.

Let me research a bit more.
Highlighted
Explorer
Posts: 11
Registered: ‎06-21-2017

Re: describe view not updated with additional columns when added to underlying tables but are select

Hi Eric,

 

Thanks once again, by updating the view I assume you mean running 'alter view' with the same complex SQL statement via the following process:

 

1) Run show table to get all the tables

2) Run describe view formatted over each table to see which are views and extract the corresponding complex SQL.

3) For each view, itterate over the complex SQL and determine if they it is selecting from any of the updated tables

4) For each view that needs updating run alter view for each view with the original complex SQL extracted in step 2.

 

I can code this up, but was wondering if there is another way.

 

Announcements