07-14-2017 03:05 AM
My cluster has 5 impala daemons running on different nodes. I am using Impala 2.5.0 (comes as part of CDH 5.7.0). I restarted catalogd service and some of the drop table impala commands in impalad has failed after the restart.
The reason for these drop table failures is there is no such table, but show tables list those tables. So tried refreshing metadata using invalidate metadata, refresh etc but still same tables appeared in the list. Tried this step in 3 of 5 impala daemons to confirm stale metadata is available in all impalad's. Yes, was able to see the same tables. So assuming that clearing impalad's own metadata cache would help, tried restarting impala daemons on all 5 nodes one by one and it cleared cache and able to see the correct tables list (tables failed while drop didn't even appeared now).
While debugging, come to know that all these drop table failures had occurred in only one 1 node. Also, had come across the below line in all 5 impalad info logs.
CatalogException: Detected catalog service ID change. Aborting updateCatalog()
In addition to this info, below info also appeared in the info logs only in 4 impala nodes, but not in impalad node where above problem had occurred.
There was an error processing the impalad catalog update. Requesting a full topic update to recover: CatalogException: Detected catalog service ID change. Aborting updateCatalog()
From this info, it seems that there was no metadata recovery only in 1 impalad node, hence, drop table has failed assuming table was there. But this log statements is conflicting with the above said behaviour. Any thoughts?