Support Questions
Find answers, ask questions, and share your expertise

which is the best way to move from an application based on a relational DB to an application based on BD technology?

Explorer

Dear Gurus,

i am new to the BD/Hadoop world.

I am currently trying to understand how to move forward.

Basically i have a java application which is based on an Oracle DB to store/read/delete data. Every time that a new record is inserted/modified i need to check if there is something to do.

I was thinking to move to an architecture based on hadoop but i have the follwoing questions that i need to answerr:

  1. Shall i use Hive or Hbase or something else to store/read the data?
  2. Shall i use spark or storm or something else to act whenever a new record is created/modified?

I know it may sounds a stupid question but it is not easy to understand the role of each product in the stack and decide what to use.

Thanks,

Fabio

2 REPLIES 2

Super Collaborator

To provide a good guidance it would help to understand how the data is entering the application and is there also a way that a record is changed bypassing your application?

Some thoughts on the tools you just mentioned:

  • Hive: Is an SQL layer to hadoop. While it not yet supports the full SQL standard, you will still be able to directly use your sql knowledge. Onething quite different to SQL DBs is that it is possible to define a schema that is applied on 'normal' files, like log files or csv files. This is what is known as 'schema on read', you can store the data in the original format but still define the columns to use SQL to handle the data
  • HBase is a 'NoSQL' DB, implementing the 'BigTable' concept. It provides you with a flexible structure, i.e. you can columns 'on-the-fly' and not all records in one table must have the same columns. HBase is designed to deal with real big number of columns and records per table. In regards of inserting and reading is much more like a 'classic' DB then HIVE, but in regards of the data modelling and the query interface it is very different.
  • Spark is handling large amounts of data for processing like number crunching etc. Most likely you could migrate your whole data logic out of your Java application into Spark. With Spark you can use Hbase or Hive as input or output for the programs,
  • Storm is designed for stream and event processing. It follows a pipeline concept

Explorer

Hi Harald,

thanks for your feedback.

The data are entering the application through a mobile application and there are no other way to change the data.

thanks again