Support Questions
Find answers, ask questions, and share your expertise

Cant see latest data from HDFS file from Impala

Highlighted

Cant see latest data from HDFS file from Impala

New Contributor

Hi All,

 

We are copying data as .csv file on HDFS folder which map to schema. When we copied new file "IDQ_log_202103010730.csv" we get provided error on Impala but we can see data from hive. 

 

==================================================

HDFS :- 

[HDFS~]$ hdfs dfs -ls /rdz/idq/files/rdz_idq_monitor/idq_status/
Found 2 items
-rw------- 2 impala supergroup 901 2021-03-01 15:26 /rdz/idq/files/rdz_idq_monitor/idq_status/IDQ_log_202103010730.csv
-rw-r--r-- 3 impala supergroup 3596 2021-01-28 07:30 /rdz/idq/files/rdz_idq_monitor/idq_status/IDQ_test.log
==================================================

 

[HDFS:21000] default> select * from rdz_idq_monitor.idq_status;
Query: select * from rdz_idq_monitor.idq_status
Query submitted at: 2021-03-03 10:24:54 (Coordinator: HDFS:25000)
ERROR: AnalysisException: Failed to load metadata for table: 'rdz_idq_monitor.idq_status'
CAUSED BY: TableLoadingException: Could not load table rdz_idq_monitor.idq_status from catalog
CAUSED BY: TException: TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, error_msgs:[TableLoadingException: Loading file and block metadata for 1 paths for table rdz_idq_monitor.idq_status: failed to load 1 paths. Check the catalog server log for more details.]), lookup_status:OK)

 

 

 

 

 

2 REPLIES 2
Highlighted

Re: Cant see latest data from HDFS file from Impala

Cloudera Employee

As mentioned in the error message, you should check the catalog server log (catalogd.INFO) for more details. Usually it will explain why it failed to load the file metadata.

Re: Cant see latest data from HDFS file from Impala

Mentor

@Jay2021 

Impala and hive share metadata catalog ie Hive MetaStore , when a database/table is created in HIVE it's readily available for hive users but not Impala! To successfully query a table or database created in HIVE there is a caveat you need to run the INVALIDATE METADATA from the impala-shell before the table is available for Impala queries.
INVALIDATE METADATA reloads all the metadata for the table needed for a subsequent query 
The next time the current Impala node performs a query against a table whose metadata is invalidated you definitely will run into errors you could use the REFRESH in the common case where you add new data files for an existing table it reloads the metadata immediately, but only loads the block location data for newly added data files, making it a less expensive operation overall.

INVALIDATE METADATA [[db_name.]table_name]

Example

 

$ impala-shell

> INVALIDATE METADATA new_db_from_hive.new_table_from_hive;

> SHOW TABLES IN new_db_from_hive;
+---------------------+
| new_table_from_hive |
+---------------------+

 

That should resolve your issue

Happy hadooping