Hi, I want to use nifi to simulate a streaming from a large dataset that contain items A,B,C,D,E, and split each line and continuously feed to spark through Kafka to implement structured streaming and analyse data.
A, B, C, D
A, C, E
B, C, D, E
I am currently using .txt file and using listfile->fetchfile->splittext->putkafka, and when I run submit spark, it show there are some error on topic.
I was wondering what type of file should I create to put this sample dataset and what executor should I use, also the spark code (python).
I recently built a presentation around nifi->kafka->spark to showcase image analysis from twitter feeds. Please take a look at this github as I think it could help you in your case: