Created on 02-03-2016 09:35 PM - edited 09-16-2022 03:02 AM
We have a process which pulls messages from MQ and puts it in HBase. Since the messages have a 10 sec expiry we cannot afford to have the cluster down. What do people do in such situations?
We need enable namenode HA on the hortonworks cluster without taking the cluster offline.
Created 02-03-2016 10:17 PM
Unfortunately there is no way to enable HDFS HA without restarting the Namenode.
So unless you can change the process to use a buffer in between. ( Kafka would be a very popular tool combining MQ like use with almost unlimited scalability and easy buffering of dozens to hundreds of Terabyte of data ) . I am not sure what you could do.
So if you really really absolutely cannot lose a tuple or you want to have a safer architecture anyway:
A) Develop a process that reads the events and puts them into kafka. You would also need a process that reads them from kafka again and puts them in hbase.
B) Switch over the process from hbase to kafka
C) Upgrade your cluster
D) Switch on the Kafka->Hbase process. That would not be time critical since even a 3 node Kafka cluster can easily store 10-20TB of data in a replicated fashion.
Created 02-03-2016 09:40 PM
In this case we need true DR. DR is different from HA.
You can setup an Active-Active site , bring down Active1 , enable HA and during this process Active2 is taking all the load.
WanDisco is a good tool for true DR.
Created 02-03-2016 10:24 PM
We currently do not have a DR site/WAN Disco. Is there any other alternatives?
Created 02-04-2016 12:09 AM
I do have a deployment where we setup HBASE DR using kafka as suggested above. I was under the impression that you are more focused on Cluster HA instead HBASE only.
Apache Falcon is one of my favorites but its more Active-Passive.
Created 02-03-2016 10:17 PM
Unfortunately there is no way to enable HDFS HA without restarting the Namenode.
So unless you can change the process to use a buffer in between. ( Kafka would be a very popular tool combining MQ like use with almost unlimited scalability and easy buffering of dozens to hundreds of Terabyte of data ) . I am not sure what you could do.
So if you really really absolutely cannot lose a tuple or you want to have a safer architecture anyway:
A) Develop a process that reads the events and puts them into kafka. You would also need a process that reads them from kafka again and puts them in hbase.
B) Switch over the process from hbase to kafka
C) Upgrade your cluster
D) Switch on the Kafka->Hbase process. That would not be time critical since even a 3 node Kafka cluster can easily store 10-20TB of data in a replicated fashion.
Created 02-03-2016 10:18 PM
I also thought Kafka for this @Benjamin Leonhardi
Created 02-03-2016 10:25 PM
@S Roy try @Benjamin Leonhardi solution