Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Difference between HBase and Hadoop/HDFS

avatar
Explorer

Hi,

This is kind of naive question but I am new to NoSQL paradigm and don't know much about it. So if somebody can help me clearly understand the difference between the HBase and Hadoop or if give some pointers which might help me understand the difference.

Till now, I did some research and acc. to my understanding Hadoop provides a framework to work with a raw chunk of data(files) in HDFS and HBase is database engine above Hadoop, which basically works with structured data instead of the raw data chunk. HBase provides a logical layer of HDFS just as SQL does. Is it correct?

Pls, feel free to correct me.

Thanks.

hari

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Let, me break down my answer in parts and provide you some initial details to help you understand it in a better manner.

Hadoop: Yes, it's a framework which is currently been used to handle the Big Data problem faced by most of the customer these days. Hadoop works on distributed computing machines which are cheaper in cost also referred as Commodity Hardware and you don't have to send/move your data over network in turn you send your code to data for faster computation and quicker results.

When bigdata come into picture we can think of two major problem i.e. how are we going to store it and how we can implement processing on it. To overcome these two major challenges which is hard to implement in your DBMS system Hadoop came into picture. Obviously, it varies with use case but right now we are just keeping ourselves constrained to Hadoop and bigdata.

The two major components of Hadoop are

  • Storage: To store the Big Data and manage it efficiently, redundantly this component came into picture. To achieve this we have HDFS which is Hadoop Distributed File System. HDFS is a type of file system for management of your data in a Hadoop Cluster and major services related to this are mentioned below:
    • NameNode ( Master Deamon)
    • Data Node ( Slave Daemon)
  • Computation : In earlier part, we have seen Storage is handled by HDFS , then next part comes is computation which is taken care by YARN framework in your Hadoop also known as Yet Another Resource negotiator. YARN components are mentioned below:
    • Resource Manager ( Master Deamon)
    • Application master ( Slave Daemon)
    • Node Manager ( Slave Daemon)

HBASE : Hbase is a open source, nosql, a distributed, scalable, big data store. It's gives you faster read and write access to your big data stored in HDFS. You can think it like a layer on top of HDFS. You will have a API using that you can write your no-sql queries and get the results. You can use in your system when you need random, realtime read/write access to your Big Data.

It was inspired and modeled after white papers released by Internet Giant Google.

Paper link: Bigtable: A Distributed Storage System for Structured Data

In my suggestion the best way to go ahead is to start from book named Hadoop Definitive Guide by Tom White.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Let, me break down my answer in parts and provide you some initial details to help you understand it in a better manner.

Hadoop: Yes, it's a framework which is currently been used to handle the Big Data problem faced by most of the customer these days. Hadoop works on distributed computing machines which are cheaper in cost also referred as Commodity Hardware and you don't have to send/move your data over network in turn you send your code to data for faster computation and quicker results.

When bigdata come into picture we can think of two major problem i.e. how are we going to store it and how we can implement processing on it. To overcome these two major challenges which is hard to implement in your DBMS system Hadoop came into picture. Obviously, it varies with use case but right now we are just keeping ourselves constrained to Hadoop and bigdata.

The two major components of Hadoop are

  • Storage: To store the Big Data and manage it efficiently, redundantly this component came into picture. To achieve this we have HDFS which is Hadoop Distributed File System. HDFS is a type of file system for management of your data in a Hadoop Cluster and major services related to this are mentioned below:
    • NameNode ( Master Deamon)
    • Data Node ( Slave Daemon)
  • Computation : In earlier part, we have seen Storage is handled by HDFS , then next part comes is computation which is taken care by YARN framework in your Hadoop also known as Yet Another Resource negotiator. YARN components are mentioned below:
    • Resource Manager ( Master Deamon)
    • Application master ( Slave Daemon)
    • Node Manager ( Slave Daemon)

HBASE : Hbase is a open source, nosql, a distributed, scalable, big data store. It's gives you faster read and write access to your big data stored in HDFS. You can think it like a layer on top of HDFS. You will have a API using that you can write your no-sql queries and get the results. You can use in your system when you need random, realtime read/write access to your Big Data.

It was inspired and modeled after white papers released by Internet Giant Google.

Paper link: Bigtable: A Distributed Storage System for Structured Data

In my suggestion the best way to go ahead is to start from book named Hadoop Definitive Guide by Tom White.

avatar
New Contributor

Generally to collect the data from various sources we use big data. all the data can be in a distinct form. so we cannot perform processes the effectively. using big data Hadoop  there are various tools available so we can perform the operations effectively