We have been facing Hive Metastore canary duration and Open connections going high for some time. On debugging, we had come across "lock wait timeout exceeded" errors in logs and were able to correlate this timeout with those peaks to certain extent.
As suggested in https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_hive_ki.html#tsb_2018_..., We had planned to upgrade CDH to 5.14.4 and did the same in dev environment. While dev is running with 5.14.4 for past 10 days, did come across an similar incident in prod wherein those two metrics going high incident but no lock wait timeout errors in logs. We took thread dump couple of times and restart HMS as temp fix on prod. On analysing those prod thread dumps, we observed the following:
We upgraded CDH to 5.14.4 in Prod couple of weeks back. Last week, when we performed data movement activities on a table which has huge no. of partitions, we had come across this issue again and log has
lock wait timeout on "INSERT INTO PARTITIONS..." etc. Can you take this issue with your technical team?