Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

Solved Go to solution

When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

New Contributor

Hi,

I am new to Nifi. I just wonder when should I use Nifi writing direct to HDFS via PutHDFS and When should I use Nifi+kafka+storm? What's the difference? Could I do data manipulation on Nifi instead of storm?

Thanks

Andy

1 ACCEPTED SOLUTION

Accepted Solutions

Re: When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

Hi @Andy Liang,

The use of Kafka and Storm will generally occur when you need to perform complex operations on your data before pushing the data in your HDP cluster (operations that cannot be performed by NiFi). Such operations can be, for example, window aggregations, complex joins, etc.

If you don't need to perform such operations before your data land in the HDP cluster, then you can use NiFi + PutHDFS.

Hope this helps.

4 REPLIES 4

Re: When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

Hi @Andy Liang,

The use of Kafka and Storm will generally occur when you need to perform complex operations on your data before pushing the data in your HDP cluster (operations that cannot be performed by NiFi). Such operations can be, for example, window aggregations, complex joins, etc.

If you don't need to perform such operations before your data land in the HDP cluster, then you can use NiFi + PutHDFS.

Hope this helps.

Re: When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

New Contributor

Thank you very much for your quick response, Pierre.

Thanks for the tutorial on your blog too. I am reading your nifi & dropbox example now.

Andy

Highlighted

Re: When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

Contributor

Hi @Andy Liang,

In addition to @Pierre Villard's answer. There are three aspects of data processing joined up here:

Streaming - Simple Event Processing

This is what NiFi is very good at. All the information needed to do the processing is contained in the event. For example:

  • Log processing: If the log contains an error then separate from the flow and send an email alert
  • Transformation: Our legacy system uses XML but we want to use AVRO. Convert each XML event to AVRO

Streaming - Complex Event Processing

This is what Storm is good at covered by Pierre.

Batch

This is where MR/Hive/Spark (not spark streaming) come in. Land on HDFS and then the data can be processed and/or explored.

Re: When to use Nifi PutHDFS and when to use Nifi+Kafka+storm ?

New Contributor

Thank you @Sebastian Carroll for the detail explaination.

Don't have an account?
Coming from Hortonworks? Activate your account here