Support Questions

Find answers, ask questions, and share your expertise

When I have to Refresh / Invalidate Metadata a table ?

avatar
Contributor

Let's assume that I have a table   test_tbl which was created through impala-shell.

I have a few questions :

  1. REFRESH the table only when I add new data through HIVE or HDFS commands ? That is when I am doing insert into ...through impala-shell no need for refreshing ?
  2. INVALIDATE METADATA  of the table only when I change the structure of the table (add columns, drop partitions) through HIVE? 
  3. DROPping partitions of a table through impala-shell (i.e alter table  .. drop partition .. purge).  Do I have to do REFRESH or INVALIDATE METADATA?
  4. DROPping partitions of a table through impala-shell . How can I compute the new stats of the partitioned table?  Compute incremental stats OR Drop Incremental stats before dropping partition ?

 

Thanks in advance. 

1 ACCEPTED SOLUTION

avatar
Super Guru

REFRESH the table only when I add new data through HIVE or HDFS commands ? That is when I am doing insert into ...through impala-shell no need for refreshing ?

Correct.

  1. INVALIDATE METADATA  of the table only when I change the structure of the table (add columns, drop partitions) through HIVE? 

Correct. Or creating new tables through Hive.

  1. DROPping partitions of a table through impala-shell (i.e alter table  .. drop partition .. purge).  Do I have to do REFRESH or INVALIDATE METADATA?

No.

  1. DROPping partitions of a table through impala-shell . How can I compute the new stats of the partitioned table?  Compute incremental stats OR Drop Incremental stats before dropping partition ?

The next time you run an incremental stats for a new partition Impala will update things correctly (e.g. the global row count)

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

2 REPLIES 2

avatar
Super Guru

REFRESH the table only when I add new data through HIVE or HDFS commands ? That is when I am doing insert into ...through impala-shell no need for refreshing ?

Correct.

  1. INVALIDATE METADATA  of the table only when I change the structure of the table (add columns, drop partitions) through HIVE? 

Correct. Or creating new tables through Hive.

  1. DROPping partitions of a table through impala-shell (i.e alter table  .. drop partition .. purge).  Do I have to do REFRESH or INVALIDATE METADATA?

No.

  1. DROPping partitions of a table through impala-shell . How can I compute the new stats of the partitioned table?  Compute incremental stats OR Drop Incremental stats before dropping partition ?

The next time you run an incremental stats for a new partition Impala will update things correctly (e.g. the global row count)

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Super Guru
For number 2, ANY changes outside of Impala, you will need INVALIDATE METADATA, or if new data added, then REFRESH will do.

Work is underway to improve it: https://issues.apache.org/jira/browse/IMPALA-3124

Cheers
Eric