Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to achieve realtime analytics on hadoop

avatar
Rising Star

Hello, I have some questions related to realtime analytics on hadoop, here is my use case and questions.

I'm trying to use some BI solutions (like Tableau) in order to do realtime analytics on hadoop.

1 - what are the most used architectures in order to achieve my goal?

2 - does it make sense to use a MPP database as a datamart (loading data according to the business fields from hadoop to mpp)?

3 - can a nosql database like cassandra replace an mpp database? If yes, is it better?

tazimehdi.com
1 ACCEPTED SOLUTION

avatar

@Mehdi TAZI

Your best Architectures for real time analytics with Hadoop, usually involve using as much as possible Hadoop for its distributed storage and distributed compute capabilities, rather than sit outside of it.. You can be guided by looking at architectures that utilize Hadoop as distributed compute, not just distributed storage.

Example Alternative Architectures:

  1. Instead of using BI ON Hadoop (Like Tableau etc), you can achieve BI IN Hadoop, e.g. Arcadia Data
  2. Instead of MPP outside Hadoop, You can also utilize an MPP solution for Hadoop such as Apache Hawq or Actian Vector for SQL Analytics, and get as much SQL functionality as possible.
  3. And yes you can use HBASE as a nosql solution.
  4. Finally Hive is making progress as also a solution for near real time capabilities, so it is always good to watch for that.

View solution in original post

3 REPLIES 3

avatar
Master Mentor
@Mehdi TAZI

You want low latency and very fast response time. HBASE is the way to go for near realtime.

You can build your own UI.

Tableau and HBASE - Follow this https://community.tableau.com/thread/146368

1 - what are the most used architectures in order to achieve my goal?

Source to destination...

You can use Storm or spark streaming. Kafka to Storm to HBASE

2 - does it make sense to use a MPP database as a datamart (loading data according to the business fields from hadoop to mpp)?

Yes if there is an option

3 - can a nosql database like cassandra replace an mpp database? If yes, is it better?

HBASE is good solution.

avatar
Rising Star

thanks a lot for you answer once again 🙂

1 - what do you mean by source to destination ? is it somekind of ETL on raw data to put in a DW ?

2.1 - is there in recomanded MPP data by hortonworks ?

2.2 if there is no option what other alternative exists ?

Thanks 😉

tazimehdi.com

avatar

@Mehdi TAZI

Your best Architectures for real time analytics with Hadoop, usually involve using as much as possible Hadoop for its distributed storage and distributed compute capabilities, rather than sit outside of it.. You can be guided by looking at architectures that utilize Hadoop as distributed compute, not just distributed storage.

Example Alternative Architectures:

  1. Instead of using BI ON Hadoop (Like Tableau etc), you can achieve BI IN Hadoop, e.g. Arcadia Data
  2. Instead of MPP outside Hadoop, You can also utilize an MPP solution for Hadoop such as Apache Hawq or Actian Vector for SQL Analytics, and get as much SQL functionality as possible.
  3. And yes you can use HBASE as a nosql solution.
  4. Finally Hive is making progress as also a solution for near real time capabilities, so it is always good to watch for that.