Support Questions

DennisJaheruddi · ‎02-24-2021

In order to understand what it would take to work with various streaming tools, I have defined this question as an umbrella for making the overview of ways to stream data.

For consistency I picked a simple reference usecase: Messages arrive from kafka, and need to be put on HDFS.

Source topic name: input

Output folder name on HDFS: output

The core usecase is picking up a bit of data from Kafka, and putting it on HDFS.

The bonus usecase is ensuring that new field C is defined by dividing fields A and B which both occur in the data, and ideally the schema would be used for this.

Subquestions:

Streaming data from Kafka to HDFS with NiFi

Streaming data from Kafka to HDFS with Flink

Streaming data from Kafka to HDFS with Flink SQL

Streaming data from Kafka to HDFS with Spark Interactive

Streaming data from Kafka to HDFS with a Spark Jar

Streaming data from Kafka to HDFS with Kafka Connect

If a substep is well documented, do not hesitate to refer to it, but please ensure the end-to-end process is documented including building and deployment.

If you notice this question is not specified well, or if there is something blocking one of the subquestions to be answered, please post a comment.

- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

DennisJaheruddi · ‎02-24-2021

The subquestions can be found here, please note that these may or may not have been answered yet:

Subquestions:

Streaming data from Kafka to HDFS with NiFi

Streaming data from Kafka to HDFS with Flink Jar

Streaming data from Kafka to HDFS with Flink SQL

Streaming data from Kafka to HDFS with Spark Interactive

Streaming data from Kafka to HDFS with a Spark Jar

Streaming data from Kafka to HDFS with Kafka Connect

Also note that the questions ask for an example, though there may be multiple language choices and other decisions to be made.

- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

Cloudera Community

Support Questions

Streaming data from Kafka to HDFS: All relevant solutions

Spark Streaming with Kafka in CDP Public Cloud (Da...

Spark Structured Streaming with NiFi and Kafka (us...

Common Kerberos Errors and Solutions

HDF/HDP Twitter Sentiment Analysis End-to-End Solu...

Receiving AVRO Messages through KAFKA in a Spark S...

Writing parquet on HDFS using Spark Streaming

Spark Streaming in CDE with Stream Messaging Manag...

Solutions for Storm Nimbus Failure

Spark Streaming Graceful Shutdown - Part2

HDP 2.6.4 - HDF 3.1: Apache Kafka - Apache Spark S...