Created 11-01-2016 10:07 AM
We are going to develop a product(Asp.Net Web Application) for one of our client. The application is going to deal with large amount of data. As per the initial requirement gathering, we got to know that, the database size would be 30 - 40 Tera bytes and every year this data gets double, i.e. as an average I could say every year the Asp.Net web application will write 30 Tera Bytes of structural data to the database tables.
If that is the case, if I choose any RDBMS(MSSQL, ORACLE, MYSQL and so on) as database to store the data which is coming from Asp.Net Web Application and we need to utilize this data to display different type of transactional functionalities, DashBoards, Reports and so on in that same web Application.
As per the requirement, the database size will grow 30 TeraBytes every year and web application is going to use this data for real time processing. But, performance is very very important for the web application. So, I have the following questions in this regards,
1. What database would be the right fit for this? which database will give good performance to deal with large volume of real time data? (We are going to fetch the data from more than 100 tera Bytes of data to display the reports/dashboards in a real world web application, All and all I could say that, the Asp.Net web application is going to use this data to display dashboards and so on, at the same time the same application is going to write the data to the database tables.)
2. How about the NoSQL(HBASE/CASSANDRA)? Can we use these NOSQL databases as backend to interact with Asp.Net Web Application? If Yes, how would be the performance of web application?(As I mentioned earlier, our web application should do retrieval/updates/inserts and deletes as link RDBMS database...Is this possible in HBASE/Cassandra?)
3. Which one will be the right choice (HBASE/CASSANDRA) for retrieval/Inserts/Updates/Deletes through web application?
4. I am not sure is there any other way in RDBMS to handle this large amount of data by maintaining 2 servers(OLTP/OLAP)? If yes, how can we do that? please suggest and help us on this...if possible share any link
You can pick HBase/Cassandra based on your application type of queries you are going to use in the application for dashboards/reports. Write performance is better in both HBase/Cassandra. HBase performs very well for range or point queries. You can try HBase and Phoenix combo for simplifying your application development.
Also you can try Phoenix ODBC driver
or .net driver
For C# and .NET
For C# Spark
Do you have to use ASP.NET? Can you move to Java Spring? Scala Play? Scala Spray? NodeJS Express? Ruby on Rails? All of these scale better and run on multiple platforms.
Most people store web data to RDBMS like Postgresql and then ingest the data into HDFS and HBase via NiFi, Sqoop, Flume and other tools. Then you run your BI, Zeppelin, Deep Learning, Machine Learning, Spark and analytics workloads on your Hadoop based data lake.