Created 01-18-2017 02:49 PM
Hi,
I am new to Nifi. I just wonder when should I use Nifi writing direct to HDFS via PutHDFS and When should I use Nifi+kafka+storm? What's the difference? Could I do data manipulation on Nifi instead of storm?
Thanks
Andy
Created 01-18-2017 02:52 PM
Hi @Andy Liang,
The use of Kafka and Storm will generally occur when you need to perform complex operations on your data before pushing the data in your HDP cluster (operations that cannot be performed by NiFi). Such operations can be, for example, window aggregations, complex joins, etc.
If you don't need to perform such operations before your data land in the HDP cluster, then you can use NiFi + PutHDFS.
Hope this helps.
Created 01-18-2017 02:52 PM
Hi @Andy Liang,
The use of Kafka and Storm will generally occur when you need to perform complex operations on your data before pushing the data in your HDP cluster (operations that cannot be performed by NiFi). Such operations can be, for example, window aggregations, complex joins, etc.
If you don't need to perform such operations before your data land in the HDP cluster, then you can use NiFi + PutHDFS.
Hope this helps.
Created 01-18-2017 03:50 PM
Thank you very much for your quick response, Pierre.
Thanks for the tutorial on your blog too. I am reading your nifi & dropbox example now.
Andy
Created 01-18-2017 03:57 PM
Hi @Andy Liang,
In addition to @Pierre Villard's answer. There are three aspects of data processing joined up here:
This is what NiFi is very good at. All the information needed to do the processing is contained in the event. For example:
This is what Storm is good at covered by Pierre.
This is where MR/Hive/Spark (not spark streaming) come in. Land on HDFS and then the data can be processed and/or explored.
Created 01-19-2017 05:26 PM
Thank you @Sebastian Carroll for the detail explaination.