Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to stream data to Cloud hadoop environment using Kafka

Highlighted

How to stream data to Cloud hadoop environment using Kafka

We have a running data file on on-premise location, would like to stream the file to Cloud Hadoop platform. So could you please help us build a solution with Kafka and all the architectural detail like where we should have the Kafka cluster and consumers to be installed/running? Any applied use cases for similar kind of scenarios would be appreciated.

1 REPLY 1
Highlighted

Re: How to stream data to Cloud hadoop environment using Kafka

Super Guru

@Sanjib Behera

I'll give you a high level overview. From a high level this is how you should look at it. There are two different ways. First one let's you do things without writing single line of code - I personally recommend this approach. It's easier to implement and maintain and improve if you need more later on. Second one involves writing code to fetch from your source file into local Kafka. Rest is same in both approaches.

1. Data being generated --> On-Premise Nifi to On-Premise Kafka (Let's say you retain data here for one week to prevent data loss in case of failures on your on premise). Then on-premise Nifi to read from Kafka and use "site-to-site" to send data to cloud Nifi which is connected to your on-site Nifi using Site-site protocol. Then your cloud Nifi to Kafka in your cloud and then HDFS.

2. Data being generated --> Write code to read data from file and write to on-premise Kafka (Let's say you retain data here for one week to prevent data loss in case of failures on your on premise). Then on-premise Nifi to read from Kafka and use "site-to-site" to send data to cloud Nifi which is connected to your on-site Nifi using Site-site protocol. Then your cloud Nifi to Kafka in your cloud and then HDFS. - The only thing you have done here is remove one nifi instance and instead used your own code. I don't see much point in doing this but just wanted to show this in case someone argues that they already have code to push to Kafka.

Assuming your infrastructure is ready and installations complete, the beauty of Nifi is something like this can be literally implemented in one day.

Don't have an account?
Coming from Hortonworks? Activate your account here