- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to achieve realtime analytics on hadoop
- Labels:
-
Apache Hadoop
Created ‎01-19-2016 09:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I have some questions related to realtime analytics on hadoop, here is my use case and questions.
I'm trying to use some BI solutions (like Tableau) in order to do realtime analytics on hadoop.
1 - what are the most used architectures in order to achieve my goal?
2 - does it make sense to use a MPP database as a datamart (loading data according to the business fields from hadoop to mpp)?
3 - can a nosql database like cassandra replace an mpp database? If yes, is it better?
Created ‎01-20-2016 03:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your best Architectures for real time analytics with Hadoop, usually involve using as much as possible Hadoop for its distributed storage and distributed compute capabilities, rather than sit outside of it.. You can be guided by looking at architectures that utilize Hadoop as distributed compute, not just distributed storage.
Example Alternative Architectures:
- Instead of using BI ON Hadoop (Like Tableau etc), you can achieve BI IN Hadoop, e.g. Arcadia Data
- Instead of MPP outside Hadoop, You can also utilize an MPP solution for Hadoop such as Apache Hawq or Actian Vector for SQL Analytics, and get as much SQL functionality as possible.
- And yes you can use HBASE as a nosql solution.
- Finally Hive is making progress as also a solution for near real time capabilities, so it is always good to watch for that.
Created ‎01-19-2016 09:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You want low latency and very fast response time. HBASE is the way to go for near realtime.
You can build your own UI.
Tableau and HBASE - Follow this https://community.tableau.com/thread/146368
1 - what are the most used architectures in order to achieve my goal?
Source to destination...
You can use Storm or spark streaming. Kafka to Storm to HBASE
2 - does it make sense to use a MPP database as a datamart (loading data according to the business fields from hadoop to mpp)?
Yes if there is an option
3 - can a nosql database like cassandra replace an mpp database? If yes, is it better?
HBASE is good solution.
Created ‎01-20-2016 10:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks a lot for you answer once again 🙂
1 - what do you mean by source to destination ? is it somekind of ETL on raw data to put in a DW ?
2.1 - is there in recomanded MPP data by hortonworks ?
2.2 if there is no option what other alternative exists ?
Thanks 😉
Created ‎01-20-2016 03:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your best Architectures for real time analytics with Hadoop, usually involve using as much as possible Hadoop for its distributed storage and distributed compute capabilities, rather than sit outside of it.. You can be guided by looking at architectures that utilize Hadoop as distributed compute, not just distributed storage.
Example Alternative Architectures:
- Instead of using BI ON Hadoop (Like Tableau etc), you can achieve BI IN Hadoop, e.g. Arcadia Data
- Instead of MPP outside Hadoop, You can also utilize an MPP solution for Hadoop such as Apache Hawq or Actian Vector for SQL Analytics, and get as much SQL functionality as possible.
- And yes you can use HBASE as a nosql solution.
- Finally Hive is making progress as also a solution for near real time capabilities, so it is always good to watch for that.
