In our current system we get the batch file data in HDFS and we transform them to Hive tables and we get realtime data through Khafka and store them in Mongo DB. We need to do reconcile between Batch and Real time data. Batch data in Hive tables and Real time in Mongo DB. What is the best method to compare the data in Hive and Mongo. Should i convert the Mongo Tables to Hive tables and compare with Hive tables or is there anything best to compare directly from Mongo to Hive tables.
... View more
I was working on the BI tool to develop descriptive and predictive analytical reports from Hadoop and No SQL database. I needed your recommendation and suggestion on my thoughts below.
We are planning to use Tableau or Oracle Business Intelligence Enterprise Edition(OBIEE) as BI Tools.
We get the cleansed data in files in HDFS(maintains all data) and NoSQL Database(maintains only the latest 2 years data). Further, I was thinking to use Spark and do required transformation and populate the data in Dimension, Cube and Fact tables in Hive and expose the Hive tables to BI tools and allow the user to generate his reports. In BI tool I was planning to have two sections one for 2 years data and another for historical data, 2 years data will from my Nosql database and historical from my HDFS.
Has anybody had a similar requirement and suggest me the best option and guide me if I am working on the right path.
... View more