Support Questions
Find answers, ask questions, and share your expertise

Real time processing and reporting tool

Real time processing and reporting tool

Contributor

Hi,

 
I have a requirement where I have all transactional data injestion into hadoop in real time and before storing the data into hadoop, process it to validate the data. If the data failed to pass validation process , it will not be stored into hadoop. The validation process also make use of historical data which is stored in hadoop. I am thinking to make Nifi --> Kafka --> storm model for  real time processing and then storing into HBase.So, can you suggest any better model for this use case and also I would like to know best open source reporting tools available.
 
 
Any suggestions will be a great help for me.


Warm Regards
Sidharth Kumar
3 REPLIES 3

Re: Real time processing and reporting tool

Expert Contributor
An architecture with Nifi --> Kafka --> Storm should definitely do the trick, though many are also using Spark streaming.

Depending on your usecase you may want to store the data in Druid for pre-aggregation or in Kudu for interactive querying.

When it comes to reporting, we sometimes (HDP) ship with superset, but depending on your usecase most other mainstream BI solutions will also give you a good experience when linked to our platform.

- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'. Also check out my techincal portfolio at https://portfolio.jaheruddin.nl

Re: Real time processing and reporting tool

Explorer

Hi Sidharth,

 

I am just curious in your case how NiFi listening to the incoming transactional data?

 

If your requirement is to validate the incoming data before storing into HDFS, just wondering why do you need Kafka? Does ValidateRecord processor in NiFi not sufficient to do this task?

 

rgds,

Rama.

Re: Real time processing and reporting tool

Expert Contributor

Hello @ramarov ,

 

Of course I cannot speak to the specific situation, but in these architectures kafka is typically seen as a buffer.

 

You will use NiFi to move the data, but before you start doing complicated streams you want the ability to easily buffer and re-play messages (for instance after something fails, or simlply after updating your analytics logic).

 

This is why NiFi typically pushes the messages to kafka, where they can be grabbed once or muliple times by engine like Spark Streaming.

 

(You mention validation, but that is not my best guess for why the data is moved through kafka).


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'. Also check out my techincal portfolio at https://portfolio.jaheruddin.nl