Created 10-31-2016 04:37 PM
Is Cassandra/HBase will be the right choice to use it as a backend for .Net Web Application? How about the Performance of the Web Application if the application is using Cassandra database or HBase database? We are going to deal with 30-40 TB of data...Every year this data is getting double, for this reason we have choosen Cassandra as backend for .Net Web Application. Please suggest us on this, whether it's the right choice or not..To deal with large amount of data, what type of databases would give very good performance with .Net Applications. Please suggest us whether we could use NOSQL(Cassandra/HBase) as database for .Net web Application. If yes, which is preferrable HBase/Cassandra?
Created 11-01-2016 01:25 AM
Based on data growth you mentioned, you definitely need to pick on database that can deal with BIG DATA. I would like to point-out that if you plan to use an Hadoop ecosystem to take advantage of other tools like Spark, Storm, Kafka, Hive, HDFS etc. the logical choice would be HBase. Don't forget that all the data that your web application generates may be subject of some machine learning for some recommendation engine that may add even more value. Having capacity to store so much data in HDFS and having the power of Spark to process it, could be differentiator for your business. You will see how valuable are those tools and how important is to have a solid integration in a single platform like Hortonworks Data Platform. Rather than going with some exotic frameworks, I suggest to use REST: http://hbase.apache.org/book.html#_rest. If you Google you will see a few frameworks for .NET, but none is consistently developed. However, REST is the universal method to decouple systems and technologies.
@Umair Khan mentions Redis. While Redis is good for low latency key-value in-memory processing, it is not the usual choice for your data growth as presented, 120-160 TB in 2 years. I can't image a Redis Cluster dealing with 120-160 TB, to not mention year 3 - 240 - 320 TB. Redis could be used as an accelerator for some of the use cases applicable to your application.
+++
If any response in this thread helped with your question, please vote and accept best answer.
Created 10-31-2016 09:06 PM
Do you have or need (in future) a Hadoop cluster? If answer is yes pick HBase else Cassandra should work.
Is Cassandra/HBase right choice?
Maybe, depends on how you are planning to use it. If you are looking for 'as database' solution please consider alternate technologies. HBase is not a database, its a key value store.
Redis is one of a good alternate to look at.
.Net support for Cassandra here:
https://academy.datastax.com/resources/getting-started-apache-cassandra-and-c-net
For HBase you can use Thrift or Rest:
https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html
Created 11-01-2016 01:25 AM
Based on data growth you mentioned, you definitely need to pick on database that can deal with BIG DATA. I would like to point-out that if you plan to use an Hadoop ecosystem to take advantage of other tools like Spark, Storm, Kafka, Hive, HDFS etc. the logical choice would be HBase. Don't forget that all the data that your web application generates may be subject of some machine learning for some recommendation engine that may add even more value. Having capacity to store so much data in HDFS and having the power of Spark to process it, could be differentiator for your business. You will see how valuable are those tools and how important is to have a solid integration in a single platform like Hortonworks Data Platform. Rather than going with some exotic frameworks, I suggest to use REST: http://hbase.apache.org/book.html#_rest. If you Google you will see a few frameworks for .NET, but none is consistently developed. However, REST is the universal method to decouple systems and technologies.
@Umair Khan mentions Redis. While Redis is good for low latency key-value in-memory processing, it is not the usual choice for your data growth as presented, 120-160 TB in 2 years. I can't image a Redis Cluster dealing with 120-160 TB, to not mention year 3 - 240 - 320 TB. Redis could be used as an accelerator for some of the use cases applicable to your application.
+++
If any response in this thread helped with your question, please vote and accept best answer.
Created 11-01-2016 10:02 AM
Hi Constantin,
Thanks for the reply! I understand that Hadoop would be the right choice but let me explain about my actual requirement in detail below,
We are going to develop a product(Asp.Net Web Application) for one of our client. The application is going to deal with large amount of data. As per the initial requirement gathering, we got to know that, the database size would be 30 - 40 Tera bytes and every year this data gets double, i.e. as an average I could say every year the Asp.Net web application will write 30 Tera Bytes of structural data to the database tables.
If that is the case, if I choose any RDBMS(MSSQL, ORACLE, MYSQL and so on) as database to store the data which is coming from Asp.Net Web Application and we need to utilize this data to display different type of transactional functionalities, DashBoards, Reports and so on in that same web Application.
As per the requirement, the database size will grow 30 TeraBytes every year and web application is going to use this data for real time processing. But, performance is very very important for the web application.
So, I have the following questions in this regards
1. What database would be the right fit for this? which database will give good performance to deal with large volume of real time data? (We are going to fetch the data from more than 100 tera Bytes of data to display the reports/dashboards in a real world web application, All and all I could say that, the Asp.Net web application is going to use this data to display dashboards and so on, at the same time the same application is going to write the data to the database tables.) 2. How about the NoSQL(HBASE/CASSANDRA)? Can we use these NOSQL databases as backend to interact with Asp.Net Web Application? If Yes, how would be the performance of web application?(As I mentioned earlier, our web application should do retrieval/updates/inserts and deletes as link RDBMS database...Is this possible in HBASE/Cassandra?)
3. Which one will be the right choice (HBASE/CASSANDRA) for retrieval/Inserts/Updates/Deletes through web application?
4. I am not sure is there any other way in RDBMS to handle this large amount of data by maintaining 2 servers(OLTP/OLAP)? If yes, how can we do that? please suggest and help us on this...if possible share any link
Created 11-01-2016 03:32 PM
This is a long discussion for a single question in HCC. Thanks for additional clarification, but the response is the same I provided earlier: HBase. I already covered the Cassandra vs. HBase. It may be true that Cassandra may allow faster writes than HBase and that HBase may allow faster scans, but at the end of the day within your Hadoop ecosystem HBase is better supported and you can take advantage of HBase snapshots and TTL capabilities which will prove key for your analytics.
Here is the approach, at high level:
Your web app writes to HBase. HBase keeps version of your data and provides TTL (time to live) capabilities. As such, insert/update/delete is just another version of your data. You can then use HBase snapshots for analytics using Spark and Phoenix. HBase snapshots do not impact region servers and as such your writes are not impacted and you have a nice isolation between your fast and many writes (OLTP) and reads needed for analytics (OLAP).
Created 11-01-2016 01:22 PM
Created 11-01-2016 04:29 PM
I agree with @Constantin Stanca. This is an HBase use case. And yes HBase is created for transactional use cases and exactly for use cases where you will scale to 10's of TBs. 30 TB to begin with is large for traditional systems but sclaing makes it even more compelling on why you should go with HBase. Scaling Oracle, MySQL or any other traditional system will bring in traditional challenges or manual sharding and increase the complexity for operations team to manage.
HBase on the other hand will provide you automatic sharding, automatic failover to new nodes and scaling by simply adding new nodes and easy online maintenance.