Created on 10-08-2016 04:17 PM - edited 08-17-2019 09:15 AM
All you need is some basic programming skill and a little experience with AWS to get things kick started.
Before we proceed, let me explain what actually is Hybrid Cloud Environment?
Hybrid cloud solutions combine the dynamic scalability of public cloud solutions with the flexibility and control of your private cloud.
Some of the benefits of Hybrid Computing include -
Kafka Mirroring
With Kafka mirroring feature you can maintain a replica of you existing Kafka Cluster.
The following diagram shows how to use the MirrorMaker tool to mirror a source Kafka cluster into a target (mirror) Kafka cluster.
The tool uses a Kafka consumer to consume messages from the source cluster, and re-publishes those messages to the local (target) cluster using an embedded Kafka producer
Use Case
Demonstrate hybrid cloud solution using Kafka Mirroring across regions
Environment Architecture
The architecture above represent two cluster environments, private and public cloud respectively, where data is replicated from source Kafka cluster to target Kafka cluster with the help of MirrorMaker tool and analysis over the data sets is performed using Spark Streaming clusters.
The internal environment stores all the data in HDFS which is accessible with Hive external tables. The purpose of storing data in HDFS is so that at any given point of time the raw data is never changed and can be used to tackle any discrepancies that might occur in the real time layer (target cluster).
The external environment receives the replicated data with the help of Mirror Maker and a spark streaming application is responsible to process that data and store it into Amazon S3. The crucial data that requires low level latency based on TTL is maintained in Amazon S3. The data is then pushed to Amazon Redshift where the user can issue low latency queries and have the results calculated on the go.
With the combine power of Hybrid Environment and Kafka mirroring you can perform different types of data analysis over streams of data with low latency
Technology Stack
Workflow
Development
The code base for Kafka Mirroring in Hybrid Cloud Environment has been officially uploaded on GitHub. You can download the source code from https://github.com/XavientInformationSystems/Kakfa-Mirroring-Hybrid-Cloud and follow the instructions for setting up the project.
Stay tuned for more amazing stuff and help the open-source community to grow further by actively participating in the work we do to expand the project.
Created on 04-07-2018 07:35 PM
Does this architecture support Spark structured streaming that was introduced in 2.0+? If not, would you have set it up the same way if you had it?