Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

Explorer

Hi,

I am new to Nifi. I just wonder when should I use Nifi writing direct to HDFS via PutHDFS and When should I use Nifi+kafka+storm? What's the difference? Could I do data manipulation on Nifi instead of storm?

Thanks

Andy

1 ACCEPTED SOLUTION

Hi @Andy Liang,

The use of Kafka and Storm will generally occur when you need to perform complex operations on your data before pushing the data in your HDP cluster (operations that cannot be performed by NiFi). Such operations can be, for example, window aggregations, complex joins, etc.

If you don't need to perform such operations before your data land in the HDP cluster, then you can use NiFi + PutHDFS.

Hope this helps.

View solution in original post

4 REPLIES 4

Hi @Andy Liang,

The use of Kafka and Storm will generally occur when you need to perform complex operations on your data before pushing the data in your HDP cluster (operations that cannot be performed by NiFi). Such operations can be, for example, window aggregations, complex joins, etc.

If you don't need to perform such operations before your data land in the HDP cluster, then you can use NiFi + PutHDFS.

Hope this helps.

Explorer

Thank you very much for your quick response, Pierre.

Thanks for the tutorial on your blog too. I am reading your nifi & dropbox example now.

Andy

Contributor

Hi @Andy Liang,

In addition to @Pierre Villard's answer. There are three aspects of data processing joined up here:

Streaming - Simple Event Processing

This is what NiFi is very good at. All the information needed to do the processing is contained in the event. For example:

  • Log processing: If the log contains an error then separate from the flow and send an email alert
  • Transformation: Our legacy system uses XML but we want to use AVRO. Convert each XML event to AVRO

Streaming - Complex Event Processing

This is what Storm is good at covered by Pierre.

Batch

This is where MR/Hive/Spark (not spark streaming) come in. Land on HDFS and then the data can be processed and/or explored.

Explorer

Thank you @Sebastian Carroll for the detail explaination.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.