Reply
New Contributor
Posts: 2
Registered: ‎02-01-2018

RESOLVED: Issue when renaming Large table stored in S3

[ Edited ]

As part of a testing effort I had some tables that needed to be renamed.

These tables are parquetfiles created within Impala but are stored in S3. I realize that when a table is renamed and is stored on S3, the data is moved from the original tablename to the new tablename under the covers. This seems to work just fine for tables of a certain size.

 

However, for my larger tables I receive the following when the SQL for rename is running:

 

Query: alter table comp_web_sales rename to web_sales
ERROR:
ImpalaRuntimeException: Error making 'alter_table' RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: tpc10_s3.comp_web_sales table not found

Could not execute command: alter table comp_web_sales rename to web_sales

 

What I noticed is that in S3, the new folder is created and all of the files continue to move until completed and then the "old" folder is removed as one would expect.

 

At that point, all of the required files reside in the folder dedicated to the new table name that was being assigned in the alter statement. However, because that statement timed out, the table name inside HIVE does not change. For example: (most entries removed for brevity)

[**] > show tables;
Query: show tables
+---------------------------+
| name |
+---------------------------+
...| catalog_returns |
| comp_catalog_sales |
| comp_store_returns |
| comp_store_sales |
| comp_web_sales |
...
| warehouse |
| web_page |
| web_returns |
| web_site |
+---------------------------+

 

When i try to query the old table name, as expected, it cannot find the data:

[**] > select count(*) from comp_web_sales;
Query: select count(*) from comp_web_sales
Query submitted at: 2018-02-01 19:37:27 (...)
Query progress can be monitored at:...
WARNINGS:
Failed to open HDFS file s3a://<LOCATION>/comp_web_sales/4c4f7290ec687341-6d01caf800000007_460147606_data.63.parq
Error(2): No such file or directory

 

Likewise, and obviously, it cannot find the new table either:

[**] > select count(*) from web_sales;
Query: select count(*) from web_sales
Query submitted at: 2018-02-01 19:38:57 (..)
ERROR: AnalysisException: Could not resolve table reference: 'web_sales'

 

How do I get them changed within Impala so it recognizes the new location now that the files are moved???

 

Thanks,

 

-= Tara =-

 

Total table size (compressed) of any of the ones I'm having issues with is in the 2+TB range.

 

 

New Contributor
Posts: 2
Registered: ‎02-01-2018

Re: Issue when renaming Large table stored in S3

This was corrected by issuing and "invalidate metadata" within Impala once the S3 transfer was complete.

 

 

Announcements