I see today that Impala some how not able to read the metadata. I created a table in Hive with text format, data loaded into Hive, Alter the file format to PARQUET using alter table <table name>SET FILEFORMAT PARQUET. Now I did invalidate metadata <tablename> in Impala, I can very well see the table in Impala by doing Show tables, but if I do describe it say's it can't load the metadata.
E1015 06:20:11.609848 34004 Analyzer.java:1445] Failed to load metadata for table: XXXXXX
Failed to load metadata for table: XXXXX. Running 'invalidate metadata XXXXXX may resolve this problem.
CAUSED BY: NullPointerException: null
I1015 06:20:11.610817 34004 jni-util.cc:177] com.cloudera.impala.common.AnalysisException: Failed to load metadata for table: xxxx
Caused by: com.cloudera.impala.catalog.TableLoadingException: Failed to load metadata for table: xxxxxx
CAUSED BY: NullPointerException: null
Any suggestions/inputs to share
Impala 1.4.0. Also additional input is it seems hive also not able to see the table after using alert command, so may not be a problem.
But I don't understand why we allow alter command to change file format and after doing that then it says Table not found. Really interesting to know what's happening here.
Thanks for following up, I'm looking into the problem since it looks like a bug.
Still I want to make sure to address your ultimate goal: My understanding is that you created a text table in Hive, and then used ALTER TABLE <tbl> SET FILEFORMAT PARQUET in Impala. Then "DESCRIBE <tbl>" fails in Impala.
What's the intention behing the ALTER command? The alteration will only chgange the table metadata, and not the data itself, i.e., the data is still stored in text, but Impala will attempt to interpret the text files as Parquet and fail.
I wasn't able to repro the issue on 1.4 with these steps:
1. hive> create table t (i int);
2. hive> insert into t select 1 from someexistingtable;
3. impala> invalidate metadata t;
4. impala> alter table t set fileformat parquet;
5. impala> descrobe t;
Could you detail the steps used to produce the issue on your end so I can investigate futher? Thanks!
Thanks for your time. Finally I got the point where I was doing the mistake, solved but still lot of questions in the mind.
Objective: Convert the existing text table to parquet table for performance gains
The actions performed are as below
hive> create table t2 like t1; # t1 is text formatted table
hive> insert into table t2 select * from t1;
hive>alter table t2 set fileformat=parquet
Impala> Invalidate metadata t2;
Impala> desc formatted t2;
Result: error, failed to load metadata, invalidate may help message.
After posting this query and did some mor work, found that even Hive was not able to describe the table.
NOte: Here I see a problem, we are in Hive version 12, the command returned successful, but it is not able to describe the table. I really see this as major issue, because when we allow the command and made successful changes to metadata , but unable to load metadata then we are lost, we will not be able to do any thing for this table, neither alter to original format nor drop the table. Don't you see this as issue?
Now I found for Hive 12 version I need to follow the below
alter table <tablename>set serde '<parquet serde>';
alter table <tablename> set fileformat inputformat "<parquet inputformat class>"
alter table <tablename> set fileformat outputformat "<parquet outputformat class>"
Now the first problem is solved, of course not on the same table as t2, i did exerice on another new tables, so now I could describe the table both on Impala and Hive. but not able to read the data because as you mentioned the underlying data is csv not parquet. Here I forgot to reload the data so that the data can be converted to parquet while copying it to new table with new altered format. finally now the issue is solved and more over it was not impala issue.
but still I have doubt, may be fundamental: alter table set fileformat will just change table metadata information, so do we see really such cases where user will just need to alter table properties to parquet or ORC. Not able think of such scenario. when user has to take care of underlying data formats, then a warning mentioning that, user should alter data file to the respective <fileformat> to read the data would really help, I think
thanks for following up in such great detail!
I completely agree that DDL commands should never leave a table in a state where it cannot even be dropped. As you pointed out, this seems to be a Hive issue.
I share your concern regarding the behavior of ALTER TABLE, however, there are legitimate uses for it. For example, if a user accidentally set the wrong format, she can change it later. You can also alter partitions of the same table to have different file formats. Last, users may have the opposite expectation as you do, and having ALTER rewrite all data into a new format against a users expectation is arguably worse than later realizing that ALTER only changes the table metadata. Your suggestion of issuing a warning sounds like a good compromise, thanks!
select c.category_name, count(order_item_quantity) as count
from order_items oi
inner join products p on oi.order_item_product_id = p.product_id
inner join categories c on c.category_id = p.product_category_id
group by c.category_name
order by count desc
i have the same error too AnalysisException: Could not resolve table reference: 'products
Nad1998, that's a different error - it means your 'products' table does not exist or is not visible to Impala (try running 'invalidate metadata products', then retry query).