Created 01-19-2016 09:28 PM
Hello, I have some questions related to realtime analytics on hadoop, here is my use case and questions.
I'm trying to use some BI solutions (like Tableau) in order to do realtime analytics on hadoop.
1 - what are the most used architectures in order to achieve my goal?
2 - does it make sense to use a MPP database as a datamart (loading data according to the business fields from hadoop to mpp)?
3 - can a nosql database like cassandra replace an mpp database? If yes, is it better?
Created 01-20-2016 03:57 AM
Your best Architectures for real time analytics with Hadoop, usually involve using as much as possible Hadoop for its distributed storage and distributed compute capabilities, rather than sit outside of it.. You can be guided by looking at architectures that utilize Hadoop as distributed compute, not just distributed storage.
Example Alternative Architectures:
Created 01-19-2016 09:37 PM
You want low latency and very fast response time. HBASE is the way to go for near realtime.
You can build your own UI.
Tableau and HBASE - Follow this https://community.tableau.com/thread/146368
1 - what are the most used architectures in order to achieve my goal?
Source to destination...
You can use Storm or spark streaming. Kafka to Storm to HBASE
2 - does it make sense to use a MPP database as a datamart (loading data according to the business fields from hadoop to mpp)?
Yes if there is an option
3 - can a nosql database like cassandra replace an mpp database? If yes, is it better?
HBASE is good solution.
Created 01-20-2016 10:45 PM
thanks a lot for you answer once again 🙂
1 - what do you mean by source to destination ? is it somekind of ETL on raw data to put in a DW ?
2.1 - is there in recomanded MPP data by hortonworks ?
2.2 if there is no option what other alternative exists ?
Thanks 😉
Created 01-20-2016 03:57 AM
Your best Architectures for real time analytics with Hadoop, usually involve using as much as possible Hadoop for its distributed storage and distributed compute capabilities, rather than sit outside of it.. You can be guided by looking at architectures that utilize Hadoop as distributed compute, not just distributed storage.
Example Alternative Architectures: