Member since
08-23-2021
5
Posts
0
Kudos Received
0
Solutions
09-19-2021
02:24 PM
This time we did show table stats and show column stats on the table before the issue ( just after the restart and before running invalidate metadata) and then after the restart and we do not notice any difference in output but problem reoccurred this time also. To understand the sequence of events : (1) Admin team does patching and restart the cluster (2) application team run a distinct year query and sees out of 31 only 27 rows are present and 4 of the partitions are missing. (3) as a support team, I run show stats which shows all 31 partitions and number of rows in partitions is correct and size also. (4) we run invalidate metadata and refresh (5) support team takes show stats which has same result (6) app team runs they query again and they able to see all 31 partitions. we will work with vendor to get further help.
... View more
08-29-2021
06:32 AM
Hello @Shelton , thanks for providing the informative details on Hive and Impala but that's not helping with the question I have. I understand that when the cluster restart happens, till the catalogue store is fully refreshed, impala queries would run slow. My question is not about slow running impala queries after restart. The question is very different and I will try to explain it again. We work on a very big CDH cluster with numerous amount of users and applications running on it. We have monthly maintenance schedule where some patch like RHEL, CDH etc is applied to all the hosts and then system is restarted. Its done in off peak hours ( late night thru early morning. ) Next day morning users of "certain" tables start complaining they are missing data in table as they are "impala" users. They can see few but some are missing. The table has year wise partition so let us say they say they can see 2021, 2020 and 2017 but 2018 and 2019 is missing. To troubleshoot, we go to beeline prompt and run same query in hive and we can see every thing. We come back to Impala and do invalidate metadata and refresh and its start working for users too. who are checking thru Impala and they think that I am a magician who can do some magic to bring their missing data 🙂 This had been happening since last 6 months and rather than become a magician every time cluster restart, I needed to find the root cause why this is happening only for certain tables. when the user is trying to query and if the meta store is not refreshed, I expect Impala to take its time on first hit and refresh it and then show every thing accurate. Showing inaccurate result or incomplete result was not part of the deal with Impala. If anyone else has faced similar issue or know what is going on here, please chime in.
... View more
08-28-2021
09:44 PM
Thanks for your reply but during maintenance windows only CDH patching or RHEL patching kind of activities are done and no script runs on the application tables. so as you suggested after the restart, Impala should not have any cached metadata. However the issue is happening consistently every month after the restart. Any suggestion what to check and investigate when next time cluster is restarted and we notice the issue.
... View more
08-28-2021
09:37 PM
On Hadoop cluster, we see intermitted errors in Cloudera Manager diagnostic logs and it fails job running on them. How to investigate it further? we had this issue few weeks back and it became okay but again started since last two days and its intermittent. LoadBalancingKMSClientProvider >> KMS provider at threw an IOException: java.io.IOException: Exception while contacting value generator KMSExceptionsProvider >> User:'hdfs/*******' Method:GET URL:https://************.com:16000/kms/v1/key/haasfba/_eek?num_keys=1&eek_op=generate Response:Internal Server Error-java.io.IOException: Exception while contacting value generator java.io.IOException: java.io.IOException: Exception while contacting value generator
... View more
Labels:
- Labels:
-
Kerberos
08-25-2021
09:34 PM
We noticed for few of the hive tables, during the maintenance window, just after the cluster restart, impala does not show the data accurately but it shows properly in hive. Once we perform invalidate metadata and refresh on the underlying table, it starts working fine. It was okay to use the work around but this is happening every month consistently after the cluster restart and so we needed to find the root cause and solution. Surprisingly issue is reported for only two tables, one of them is view. These are external hive table with parquet format. How to investigate the issue any further?
... View more
Labels:
- Labels:
-
Apache Impala