- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Streaming data from Kafka to HDFS: All relevant solutions
- Labels:
-
Apache Kafka
Created ‎02-24-2021 07:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In order to understand what it would take to work with various streaming tools, I have defined this question as an umbrella for making the overview of ways to stream data.
For consistency I picked a simple reference usecase: Messages arrive from kafka, and need to be put on HDFS.
Source topic name: input
Output folder name on HDFS: output
The core usecase is picking up a bit of data from Kafka, and putting it on HDFS.
The bonus usecase is ensuring that new field C is defined by dividing fields A and B which both occur in the data, and ideally the schema would be used for this.
Subquestions:
Streaming data from Kafka to HDFS with NiFi
Streaming data from Kafka to HDFS with Flink
Streaming data from Kafka to HDFS with Flink SQL
Streaming data from Kafka to HDFS with Spark Interactive
Streaming data from Kafka to HDFS with a Spark Jar
Streaming data from Kafka to HDFS with Kafka Connect
If a substep is well documented, do not hesitate to refer to it, but please ensure the end-to-end process is documented including building and deployment.
If you notice this question is not specified well, or if there is something blocking one of the subquestions to be answered, please post a comment.
- Dennis Jaheruddin
If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.
Created ‎02-24-2021 07:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The subquestions can be found here, please note that these may or may not have been answered yet:
Subquestions:
Streaming data from Kafka to HDFS with NiFi
Streaming data from Kafka to HDFS with Flink Jar
Streaming data from Kafka to HDFS with Flink SQL
Streaming data from Kafka to HDFS with Spark Interactive
Streaming data from Kafka to HDFS with a Spark Jar
- Dennis Jaheruddin
If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.
