Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala Insert Overwrite does not update Metadata after data insert

Impala Insert Overwrite does not update Metadata after data insert

I am getting two problems in my PROD env,due this problem we are loosing the confidence.please suggest the solution.

 

Problem 1 :  INSERT OVERWRITE TABLE_NAME (PARTITION..) SELECT ...

this is able to insert the data,i can see the data throw browser in the partition,but when try to fetch the data it return 0. though i refresh table everytime new data is added but still i tried to invalidate metadata for the table and again tried to fire select query and it returned 0 record. means there is no metadata info in hive metadata,looks like it missed to update the metadata. this is random behaviour.

 

Problem 2 : once data is inserted,invalidate is fired right after the data load,when i do select query it does not return the data but after doing invalidate metadat again it return the data,again this happens randomly.

 

please note we are just inserting data less than 10 MB. using cloudera 5.3.1 on RED hat 6.x.

 

8 REPLIES 8

Re: Impala Insert Overwrite does not update Metadata after data insert

Contributor

Can you elaborate on "i can see the data throw browser in the partition"? What do you mean by that?

 

What do you mean by "when try to fetch the data it return 0"? Do you mean that no rows are returned?

Re: Impala Insert Overwrite does not update Metadata after data insert

Sorry for typo.i.mean I can see the data is stored at hdfs. I can browse
the path and see through browser.

Re: Impala Insert Overwrite does not update Metadata after data insert

Contributor
Can you please answer my second question as well?

Re: Impala Insert Overwrite does not update Metadata after data insert

sorry for late reply, the nas for your 2nd question is when i run count query on the table for the given partition,it returns me 0 records.

 

Apart from the above problem i am facing yet another problem and that is,running refresh metadata is not loading the metadata on all impalad instances.

so casn you tell me that after inserting the data into table on node1 if i run refresh metadata then will it update metadata on all my three nodes or only on the node1 ? may be answer of this problem can be my solution.

 

please note i am running impala version  version 2.1.1-cdh5.

Re: Impala Insert Overwrite does not update Metadata after data insert

Contributor

Re: Impala Insert Overwrite does not update Metadata after data insert

Thanks,this was really helpful.

BTW can you answer my following queries ?

 

1. Running refresh metadata command on one impalad node does not refresh other impalad nodes ?

2. if i insert data through impala using insert command on node1 and fire select query on node2 right after that then there is chances that i may not get the result as node2 may have no information of newly added data because statestore might take time to broadcast the newly added data metadata to all the nodes and my select query runs before that.this can not happen always but sometimes.

 

[note : why i am asking this question is becuase currently in my PROD env i run a job which insert the data and right after that run select query on the same data. insert query request goes to some other data nodes as chosen by hproxy and select query goes to diffrent data nodes as forwarded by hproxy. here my select query does not return the data and throws error like the parquet file does not exist. so i will get the real cause once you answer my query(2). let me know if you feel there could be some other reason also]

 

 

Highlighted

Re: Impala Insert Overwrite does not update Metadata after data insert

Contributor
1. Correct.

2. Correct.

Re: Impala Insert Overwrite does not update Metadata after data insert

I resolved the problem,now after inserting the data through impala i am runninf refresh command on all individual impala nodes and then running select query on table.

 

Thanks guys.