Created on 11-18-2016 01:39 AM - edited 09-16-2022 03:48 AM
Hi,
We have observed an unsafe behaviour from CDH 5.3.2/5.9 libraries. I am sharing the observations with the hope that the issues are/will be fixed in later releases.
We have a CDH 5.3.2 cluster that has been running fine for months. Yesterday, out of the blue, Region Servers started dropping like flies. There were no error messages in the logs, just an abrupt startup entry with all the classpath info etc. It took me a good hour to narrow down the source of the problem.
Apparently, one of the colleagues tried to fetch some "fresh data" from the cluster using newer CDH 5.9 client libraries. That's it! Whenever he connected to the CDH 5.3.2 cluster and attempted to query a table, all cluster's region servers crashed without an error message.
It is really worrying that an accidental connection using newer libraries (5.9) can bring the whole cluster (5.3.2) offline.
So I wonder: do hadoop/hbase architecture have some kind of safety mechanisms in terms of library incompatibility? Maybe this safety mechanism has not been implemented? Or maybe it is non-existent whatsoever?
Thanks,
Gin
Created 11-18-2016 02:26 AM
Created 11-18-2016 02:26 AM
Created 11-18-2016 03:00 AM