I'm currently setting up a highly-available MySQL as backend for Ranger, Hive, Oozie and Ambari.
My target would be a Master-Master Replication with a Load Balancer that handles outages. Then, no manual steps had to be taken in case of server failure.
In this Community Article it is mentioned that Hive has some issues with Master-Master scenarios. Is this still the case? Or is MySQL Master-Master-Replication supported in HDP 2.3+? If not, what are the issues with Hive?
Best regards and thanks for your help!
IMO, it should work as client doesn't need to worry about what replication strategy mysql are using. But it would be of great learning, if you can try and share your experience.
At present, only Master-Slave MySQL replication is recommended in 2.3.4, and if you want help to set it up then following link may help you.
Documentation says, When 2 Hive Metasore (HM) instances are running, only incase of any issues with first uri, second one would be used. So, Ideally, request should come to anyone of the instances always. Incase of any issues with one instance, then it can go to another one. So, I recently setup a Hive Metastore with MySQL Active Active replication assuming there won't be any situation where both HM instances operates on the same record. But, this doesn't seem to be the case. I am seeing requests are coming to both instances. This can cause replication issues when there is any CRUD operation on the same record and it has happened in my case. Would like to hear your thoughts.
In my opinion it is not relevant which Metastore connects to the Database, if you have a Load-Balancer between Metastore and Database. It is more relevant, which database (or which Active Node to be more specific) is accessed by the Metastore. If the same record is altered in two MySQL nodes simultaneously, that may be a problem (I'm not too deep in the matter how MySQL handles that). Thus, if your Load-Balancer channels all requests to one MySQL host and only fails over on a failure, it should not be a problem.
Master-Master replication works great for me. I used it on HDP2.3 and 2.5. To get the solution fully automated, I used MySQL Router as server proxy.
I tried MySQL Cluster, but it needs "nbcluster" as storage engine, when Hive works with "InnoDB". Hive metastore with "nbcluster" engine gave me very strange errors.
I also tried MySQL Fabric, but it needs user interference to change repaired MySQL server status.
I tested only for Hive metastore database.
Anyway I am going to post an article about configuration of master-master replication + MySQL Router.
@Edgar Daeds could you please provide URL in case you have already published the article on master-master replication + MySQL Router setup
Hi Edger Daeda could you please share the blog url in case you are done with publishing.
I implemented M-M-A-P setup for Mysql High Availability and didn't see any issues since 2 yrs , Here is the link if you are looking for same kind of HA for MySQL.
Hi @Divakar Annapureddy I am trying to implement this with mysql Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (x86_64) using readline 5.1 and HDP 2.6.2 and starting mysql failing wiht below error 171019 9:42:29 [ERROR] /usr/libexec/mysqld: unknown variable 'innodb_flush_logs_at_trx_commit=1' 171019 9:42:29 [ERROR] Aborting 171019 9:42:29 InnoDB: Starting shutdown... 171019 9:42:34 InnoDB: Shutdown completed; log sequence number 0 44243 171019 9:42:34 [Note] /usr/libexec/mysqld: Shutdown completeI am wondering these setting still valid for MYSQL version