Hadoop vs RDBMS: Comparision between Hadoop & Database?
1) Volume of data: For the lower volume of data such as few GB’s if RDBMS fulfills your requirement it is the best. When the data size exceeds, RDBMS becomes very slow. In contrast to this, Hadoop framework’s processing power comes into realization when the file sizes are very large and streaming reads and processing is the demand of the situation.
2) Latency: RDBMS can give a very quick response when the data size is ideal for its processing power. In the case of Hadoop, it's very different. First of all, Hadoop is efficient for batch processing of data. Hence, the results are only available after a large amount of data has been processed. Therefore, Hadoop is not the ideal platform to use when immediate results are expected.
3) Throughput: Throughput refers to the amount of data processed in a period of time. And Hadoop's throughput if higher than RDBMS.
4) ACID Property: ACID property is for transaction-based systems. Whereas, in the case of Hadoop nothing like ACID is existent. But if we want to talk in the context of Distributed Databases there is a HBASE property (Basically Available, Soft State, Eventually Consistent). You can dig into it for more info or we can discuss it in a separate thread.
5) Schema: If we talk about RDBMS, it is used to store structured data or semi-structured data with null values in certain columns in the tables. Hadoop is used to store semi-structured data and unstructured data in files. All the processing algorithms are implemented on the files stored in HDFS in case of Hadoop. In the case of RDBMS, querying languages such as SQL are used to fetch data from the tables.
6) Variety of Data Handling: In case of RDBMS, only that data can be stored which can be represented in a certain format in a combination of row and column of the table. In Hadoop, any kind of data can be stored but it's only productive if you can process it using MapReduce job. There are two terms I’ll like to discuss. One is schema-on-write which is used by traditional RDBMS where data should be in a specific format before writing it to the table. In Hadoop, schema-on-read is used where you can store any data in raw format and the structure is imposed at processing time based on the requirements of the processing application.
7) Response Time: Response time for RDBMS is very less if the data is in its processing limits whereas, Hadoop is very fast to process very large files but its jobs are executed in batches from time to time