Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Master Guru

HBase along with Phoenix is one of the most powerful NoSQL combinations. HBase/Phoenix capabilities allow users to host OLTPish workloads natively on Hadoop using HBase/Phoenix with all the goodness of HA and analytic benefits on a single platform (Ie Spark-hbase connector or Phoenix Hive storage handler). Often a requirement for HA implementations is a need for DR environment. Here I will describe a few common patterns and in no way is this the exhaustive HBase DR patterns. In my opinion, pattern 5 is the simplest to implement and provides operational ease & efficiency.

Here are some of the high level replication and availability strategies with HBase/Phoenix

  • HBASE provides High Availability within a cluster by managing region server failures transparently.
  • HBASE provides various cross DC asynchronous replication schemes
    • Master/Master replication topology
      • Two clusters replicating all edits, bi-directionally to each other
    • Master/Slave topology replication
      • One cluster replicating all edits to second cluster
    • Cyclic topology for replication
      • A ring topology for clusters, replicating all edits in an acyclic manner
    • Hub and spoke topology for replication
      • A central cluster replicating all edits to multiple clusters in a uni-directional manner
  • Using various topologies described above cross DC replication scheme can be setup as per desired architecture

Pattern 1

  • Reads & Writes served by both clusters
  • An implementation of client to provide for stickiness for writes/reads based on a session ID like concept needs to investigated
  • Master/Master replication between clusters
    • Bidirectional replication
  • Replication post failover - recovery instrumented via Cyclic Replication

13768-1.png

Pattern 2

  • Reads served by both clusters
  • Writes served by single cluster
  • Master/Master replication between clusters
    • Bidirectional replication
  • Client will failover to secondary cluster
  • Replication post failover - recovery instrumented via Cyclic Replication

13769-2.png

Pattern 3

  • Reads & Writes served by single cluster
  • Master/Master replication between clusters
    • Bidirectional replication
  • Client will failover to secondary cluster
  • Replication post failover - recovery instrumented via Cyclic Replication

13770-3.png

Pattern 4

  • Reads & Writes served by single cluster
  • Master/Slave replication between clusters
    • Unidirectional replication
  • Client will failover to secondary cluster
  • Manual resync required on ”primary” cluster due to unidirectional replication

13771-4.png

Pattern 5

  • Ingestion via NiFi Rest API
    • Supports handling secure calls and round trip responses
  • Push data to Kafka to democratize data to all apps interested in data set
    • Secure Kafka topics via Apache Ranger
  • NiFi dual ingest into N number of HBase/Phoenix clusters
    • Enables in-sync data stores
  • Operational ease
    • NiFi back pressuring will handle any ODS downtime
    • UI flow orchestration
    • Data Governance built in via Data Provenance
      • Event level linage

13772-5.jpg

Additional HBase Replication Documentation

4,718 Views
Comments
avatar
New Contributor

Does Master to Master or cyclic keeps on replicating the data back and forth ? If an upsert is executed from C1 and it is propogated to C2. Now as C1 is added in C2 as peer will the replication happen to C1 back and then again to C2 (Going C1 to C2 to C1 to C2 to C1 .....)