Support Questions

Find answers, ask questions, and share your expertise

"invalidate metadata" do not work

avatar
Explorer

Hello there,

I'm new to Hadoop and I'm currently trying some examples resembling to the problem I have to solve.

I've created a very simple table in Hive (via hive shell). Then tried to look at it in Impala (via impala-shell). 
Even though I typed "invalidate metadata" before looking at the table in Impala, I couldn't see the table created in Hive. 

 

It seems that Impala and/or Hive cannot access the meta-store (?).
I looked at impalad logs but didn't find any error. 

Below my configuration and the stuff I typed in hive and impala. 

1.Configuration


* A 6 node  CentOS 6.8 cluster
* CDH 5.7.1

*  impalad version 2.5.0
Impala Shell v2.5.0

Apache Derby - 10.11.1.1
Hive 1.1.0


2. Table creation & lookup 

hive> create database new_db_from_hive ;

OK

hive> create table  new_db_from_hive.new_table_from_hive(x int) ;

OK

hive> show databases like 'new*' ;

OK

new_db_from_hive

 

Then in Impala : 

[myhost.hadoop:21000] > invalidate metadata ;

Query: invalidate metadata

Fetched 0 row(s) in 3.26s

[myhost.hadoop:21000] > show databases ;

Query: show databases

+------------------+----------------------------------------------+

| name             | comment                                         |

+------------------+----------------------------------------------+

| _impala_builtins | System database for Impala builtin functions |

| default          | Default Hive database                        |

 

+------------------+----------------------------------------------+

 

However, I can see the database on the hdfs file system : 

 

$ hdfs dfs -ls /user/hive/warehouse/new_db_from_hive.db

Found 1 items

drwxrwxrwt   - <myuserid> hive          0 2016-10-24 18:46 /user/hive/warehouse/new_db_from_hive.db/new_table_from_hive

 

I don't know what is missing. 
Thank you in advance for any help. 

 

 

2 ACCEPTED SOLUTIONS

avatar

 

Do you have sentry enabled in the cluster?

 

Do you notice the database when you run the show database query on beeline from the same user?

 

With HIVE CLI the show database query's results will not be filtered based on the user privileges.

 

But with beeline and impala-shell -- the user privileges will be evaluated before rendering the result. If the respective user dont have read privilege on the database, it wont be listed in the result.

 

 

View solution in original post

avatar
Explorer

 

Thank you for your advices. It finally worked.
I just shifted from Hive shell to beeline.

Using beeline, I could see databases created  in Impala.
In impala, after running the "invalidate metadata" query, I could see databases created through beeline. 

 

Apparently, hive shell cannot access the remote metastore.
I don't know if this is because hive shell is now deprecated or if it's just a configuration issue, but  I will proceed with beeline. 

show databases via beeline : 

 

beeline> !connect jdbc:hive2://bigfour.hadoop:10000

 Connected to: Apache Hive (version 1.1.0-cdh5.7.1)

 0: jdbc:hive2://bigfour.hadoop:10000> show databases ;

 

+---------------------------+--+

|    database_name         |

+---------------------------+--+

| default                           |

| new_db_from_beeline  |

| new_db_from_impala   |

+---------------------------+--+


show databases via impala-shell (before and after "invalidate metadata") : 

[bigtwo.hadoop:21000] > show databases ; 

+-----------------------------+----------------------------------------------+

| name                           | comment                                                        |

+-----------------------------+----------------------------------------------+

| _impala_builtins          | System database for Impala builtin functions |

| default                         | Default Hive database                                    | 

| new_db_from_impala |                                                                         |

+-----------------------------+----------------------------------------------+

 

[bigtwo.hadoop:21000] > invalidate metadata ;

 

 

[bigtwo.hadoop:21000] > show databases ;

+-------------------------------+---------------------------------------------+

| name                             | comment                                                        |

+-------------------------------+---------------------------------------------+

| _impala_builtins            | System database for Impala builtin functions |

| default                           | Default Hive database                                    |

 | new_db_from_beeline |                                                                         |

| new_db_from_impala   |                                                                         |  

+------------------------------+-----------------------------------------------+




 

View solution in original post

12 REPLIES 12

avatar
Rising Star

What do you see if you type "use new_db_from_hive;" into your Impala shell?

avatar
Explorer

It gives me an arror : 

> use new_db_from_hive ;

Query: use new_db_from_hive

ERROR: AnalysisException: Database does not exist: new_db_from_hive

avatar
Contributor

Does it work if you create the database through Impala? 

avatar
Explorer

Yes. Databases and Tables can be created normally in Impala. 

[myhost.hadoop:21000] > create database new_db_from_impala ;

Query: create database new_db_from_impala

 

Fetched 0 row(s) in 0.18s

 

[myhost.hadoop:21000] > create table new_db_from_impala.new_table_from_impala(x int) ;

Query: create table new_db_from_impala.new_table_from_impala(x int)

 

 

[myhost.hadoop:21000] > show databases ;

Query: show databases

+--------------------+----------------------------------------------+

| name               | comment                                      |

+--------------------+----------------------------------------------+

| _impala_builtins   | System database for Impala builtin functions |

| default            | Default Hive database                        |

 

| new_db_from_impala |                                              |

+--------------------+----------------------------------------------+

Fetched 3 row(s) in 0.00s

[myhost.hadoop:21000] > use new_db_from_impala ;

Query: use new_db_from_impala

[myhost.hadoop:21000] > show tables ;

Query: show tables

+-----------------------+

| name                  |

+-----------------------+

| new_table_from_impala |

+-----------------------+

Fetched 1 row(s) in 0.00s

avatar
Contributor

The connection to HMS seems to be working fine. Here are some things to consider:

1. create a table in Hive and run invalidate metadata in Impala. Does this work?

2. Check the impalad.INFO and catalogd.INFO for any exceptions/errors

3. Keep in mind that invalidate metadata is asynchronous, i.e. it returns immediately and then in the background the catalog objects are sent to the impalad nodes. This could take some time, so the question is if the db shows up eventually (after a few secs) or never.

 

Dimitris

avatar
Explorer

1. create a table in Hive and run invalidate metadata in Impala. Does this work?

This is what I've done several times (create table/database in Hive; then run invalidate metadata in Impala). And no. It doesn't work.

 

2. Check the impalad.INFO and catalogd.INFO for any exceptions/errors

 

catalogd.ERROR  includes exceptions concerning the missing tables. But there's no additional information about the origin of the problem : 

E1024 15:55:53.764930 10217 catalog-server.cc:76] TableNotFoundException: Table not found: new_db_from_hive.new_table_from_hive

E1024 15:59:52.447249 10217 catalog-server.cc:76] TableNotFoundException: Table not found: new_db_from_hive2nd.new_table_from_hive

E1024 16:18:37.696986 14116 catalog-server.cc:76] TableNotFoundException: Table not found: new_db_from_hive.new_table_from_hive

E1024 16:30:49.233502 10217 catalog-server.cc:76] TableNotFoundException: Table not found: new_db_from_hive3rd.new_table_from_hive

 

In impalad.INFO, I found some lines that *may* be of interest (?) :

I1024 18:50:41.929322 10036 Frontend.java:866] analyze query invalidate metadata

I1024 18:50:41.950764 10036 impala-server.cc:1430] Waiting for catalog version: 34 current version: 26

I1024 18:50:45.444826 10036 impala-beeswax-server.cc:307] get_results_metadata(): query_id=5243589279189211:df729a8d9a1687a4

I1024 18:50:45.445307 10036 impala-beeswax-server.cc:349] close(): query_id=5243589279189211:df729a8d9a1687a4

I1024 18:50:45.445310 10036 impala-server.cc:915] UnregisterQuery(): query_id=5243589279189211:df729a8d9a1687a4

I1024 18:50:45.445314 10036 impala-server.cc:1001] Cancel(): query_id=5243589279189211:df729a8d9a1687a4

I1024 18:50:45.446462 10036 impala-beeswax-server.cc:172] query(): query=invalidate metadata

 

I don't know if this variation of version in the catalog is normal. 

 

3. Keep in mind that invalidate metadata is asynchronous, i.e. it returns immediately and then in the background the catalog objects are sent to the impalad nodes. This could take some time, so the question is if the db shows up eventually (after a few secs) or never.

There are few data in Hive, Impala. 
And even if I try looking at the existing tables in Impala much later after I ran the "invalidate metadata" query, they still do not appear. 


 

avatar
Explorer

 

Could it be a misconfiguration issue in Hive ? 

 

* On a first node of my cluster, I've created via hive shell a database and a table. Say : database1.table1

* On a second node, I've created database2.table2


If I look at the warehouse directory : /user/hive/warehouse/  on both nodes I can see the two databases : 

database1.db and database2.db

 

However, if I check the existence of these databases through hive-shell, I don't see both databases using shell-hive query 'show databases' :  I only see database1  on the first node  and database2 on the second node. 

[Edited post] 

I can only run one instance of hive-shell by node. 
I suppose that the Metastore is configured in  "Embedded mode". 

I 'm using Cloudera Manager, but I can't find where I can change the Metastore mode configuration. 

 

avatar

 

Do you have sentry enabled in the cluster?

 

Do you notice the database when you run the show database query on beeline from the same user?

 

With HIVE CLI the show database query's results will not be filtered based on the user privileges.

 

But with beeline and impala-shell -- the user privileges will be evaluated before rendering the result. If the respective user dont have read privilege on the database, it wont be listed in the result.

 

 

avatar
Explorer

 

Thank you for your advices. It finally worked.
I just shifted from Hive shell to beeline.

Using beeline, I could see databases created  in Impala.
In impala, after running the "invalidate metadata" query, I could see databases created through beeline. 

 

Apparently, hive shell cannot access the remote metastore.
I don't know if this is because hive shell is now deprecated or if it's just a configuration issue, but  I will proceed with beeline. 

show databases via beeline : 

 

beeline> !connect jdbc:hive2://bigfour.hadoop:10000

 Connected to: Apache Hive (version 1.1.0-cdh5.7.1)

 0: jdbc:hive2://bigfour.hadoop:10000> show databases ;

 

+---------------------------+--+

|    database_name         |

+---------------------------+--+

| default                           |

| new_db_from_beeline  |

| new_db_from_impala   |

+---------------------------+--+


show databases via impala-shell (before and after "invalidate metadata") : 

[bigtwo.hadoop:21000] > show databases ; 

+-----------------------------+----------------------------------------------+

| name                           | comment                                                        |

+-----------------------------+----------------------------------------------+

| _impala_builtins          | System database for Impala builtin functions |

| default                         | Default Hive database                                    | 

| new_db_from_impala |                                                                         |

+-----------------------------+----------------------------------------------+

 

[bigtwo.hadoop:21000] > invalidate metadata ;

 

 

[bigtwo.hadoop:21000] > show databases ;

+-------------------------------+---------------------------------------------+

| name                             | comment                                                        |

+-------------------------------+---------------------------------------------+

| _impala_builtins            | System database for Impala builtin functions |

| default                           | Default Hive database                                    |

 | new_db_from_beeline |                                                                         |

| new_db_from_impala   |                                                                         |  

+------------------------------+-----------------------------------------------+