Hi, I'm currently using HDP 2.4.3-0.227 and testing it's H-A capabilities. But I'm encountering a slow failover problem.
My RM is deployed on my namenode2 and namenode1 using QJM H-A thanks to Ambari.
I'm first executing a job with terasort :
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-18.104.22.168.4.3.0-227.jar terasort random-data6 sorted-data36
then shutting down the machine with active RM, the job will stuck a few seconds after and block for around 15 minutes.
After these 15 minutes, it will start again and finish with success. H-A is working but it is obviously too slow for production.
I found an issue similar and better described than me on apache website : https://issues.apache.org/jira/browse/YARN-2578
But I'm not sure how to apply the patch and don't think it is a good idea at first because of the target version.
Do anybody met this problem before and solved it using this patch or something else ?