Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to extract the Records processed in a Spark streaming Batch

avatar
Expert Contributor

Hi

I am using NiFi to stream csv files to Spark Streaming. Within Spark I register and override a streaming listeners to get batch (write to file) related information: Spark Streaming Listener. So for each batch I can know the start time, end time, scheduling delay, processing time and number of records etc. What I want is to know is, exactly what files were processed in a batch. so I would want to output the batch info mentioned above with an array of UUIDs for all processed files in that batch (the UUIDs can be the file attribute or if need be can be inside the content of the file aswell). I dont think I can pass the Dtream RDD to the listener. Any suggestions?

Thanks

1 ACCEPTED SOLUTION

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
1 REPLY 1

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login