Support Questions

Find answers, ask questions, and share your expertise

When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

avatar
Rising Star

Hi,

I am new to Nifi. I just wonder when should I use Nifi writing direct to HDFS via PutHDFS and When should I use Nifi+kafka+storm? What's the difference? Could I do data manipulation on Nifi instead of storm?

Thanks

Andy

1 ACCEPTED SOLUTION

avatar

Hi @Andy Liang,

The use of Kafka and Storm will generally occur when you need to perform complex operations on your data before pushing the data in your HDP cluster (operations that cannot be performed by NiFi). Such operations can be, for example, window aggregations, complex joins, etc.

If you don't need to perform such operations before your data land in the HDP cluster, then you can use NiFi + PutHDFS.

Hope this helps.

View solution in original post

4 REPLIES 4

avatar

Hi @Andy Liang,

The use of Kafka and Storm will generally occur when you need to perform complex operations on your data before pushing the data in your HDP cluster (operations that cannot be performed by NiFi). Such operations can be, for example, window aggregations, complex joins, etc.

If you don't need to perform such operations before your data land in the HDP cluster, then you can use NiFi + PutHDFS.

Hope this helps.

avatar
Rising Star

Thank you very much for your quick response, Pierre.

Thanks for the tutorial on your blog too. I am reading your nifi & dropbox example now.

Andy

avatar
Rising Star

Hi @Andy Liang,

In addition to @Pierre Villard's answer. There are three aspects of data processing joined up here:

Streaming - Simple Event Processing

This is what NiFi is very good at. All the information needed to do the processing is contained in the event. For example:

  • Log processing: If the log contains an error then separate from the flow and send an email alert
  • Transformation: Our legacy system uses XML but we want to use AVRO. Convert each XML event to AVRO

Streaming - Complex Event Processing

This is what Storm is good at covered by Pierre.

Batch

This is where MR/Hive/Spark (not spark streaming) come in. Land on HDFS and then the data can be processed and/or explored.

avatar
Rising Star

Thank you @Sebastian Carroll for the detail explaination.