I am getting two problems in my PROD env,due this problem we are loosing the confidence.please suggest the solution.
Problem 1 : INSERT OVERWRITE TABLE_NAME (PARTITION..) SELECT ...
this is able to insert the data,i can see the data throw browser in the partition,but when try to fetch the data it return 0. though i refresh table everytime new data is added but still i tried to invalidate metadata for the table and again tried to fire select query and it returned 0 record. means there is no metadata info in hive metadata,looks like it missed to update the metadata. this is random behaviour.
Problem 2 : once data is inserted,invalidate is fired right after the data load,when i do select query it does not return the data but after doing invalidate metadat again it return the data,again this happens randomly.
please note we are just inserting data less than 10 MB. using cloudera 5.3.1 on RED hat 6.x.
Can you elaborate on "i can see the data throw browser in the partition"? What do you mean by that?
What do you mean by "when try to fetch the data it return 0"? Do you mean that no rows are returned?
sorry for late reply, the nas for your 2nd question is when i run count query on the table for the given partition,it returns me 0 records.
Apart from the above problem i am facing yet another problem and that is,running refresh metadata is not loading the metadata on all impalad instances.
so casn you tell me that after inserting the data into table on node1 if i run refresh metadata then will it update metadata on all my three nodes or only on the node1 ? may be answer of this problem can be my solution.
please note i am running impala version version 2.1.1-cdh5.
Thanks,this was really helpful.
BTW can you answer my following queries ?
1. Running refresh metadata command on one impalad node does not refresh other impalad nodes ?
2. if i insert data through impala using insert command on node1 and fire select query on node2 right after that then there is chances that i may not get the result as node2 may have no information of newly added data because statestore might take time to broadcast the newly added data metadata to all the nodes and my select query runs before that.this can not happen always but sometimes.
[note : why i am asking this question is becuase currently in my PROD env i run a job which insert the data and right after that run select query on the same data. insert query request goes to some other data nodes as chosen by hproxy and select query goes to diffrent data nodes as forwarded by hproxy. here my select query does not return the data and throws error like the parquet file does not exist. so i will get the real cause once you answer my query(2). let me know if you feel there could be some other reason also]
I resolved the problem,now after inserting the data through impala i am runninf refresh command on all individual impala nodes and then running select query on table.