Please, as you might consume an incremental loading of data from Oracle to Hadoop.
I will comment briefly as work currently I use Oracle materialized views to get only the data that changed last time and new records, so I avoid each time bringing the entire database.
But now that we are entering in the world of BigData, I want to know if we can replace Oracle materialized views and incremental loads with a tool Hadoop as flume, storm, Kafka nifi, what would be the most appropriate
If you need additional information do not hesitate to ask
Thanks in advance
Sqoop provides for incremental imports. See https://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_incremental_imports. For more realtime performance you may want to investigate a tool such as Oracle Golden Gate.
Thanks for your help, the issue of incremental loads Sqoop and I work very well when you have an update on the Oracle table. On the issue of oracle golden gate, we are verifying costs to try, if we can implement them as we would remark was, but according to articles I reviewed is very efficient for incremental loads in real time
Materialized views are like tables with the capability to load delta from Mview logs.
In the world of Hadoop:
We can have tables in Hive and then run sqoop to load incremental data from oracle to Hive tables.
Thank you very much, looks very interesting, I will perform a test on the cluster we have implemented, I will comment soon as worked
Check this http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/u_http.htm#CHDIAFFA, UTL_HTTP package makes HTTP Callouts from PLSQL where in you can expose a web service which consumes a streamed object and configure that web service using UTL_HTTP package inside oracle triggers.
Whenever an Update/Insert/Delete on a table a trigger is invoked which you can pass that updates to the web service. From web service you can use any messaging service to publish the Data.
Please see this thread http://stackoverflow.com/questions/27434916/oracle-to-hadoop-data-ingestion-in-real-time
1) GG is very expensive tool so not sure if you want to get into GG ;)
2) Sqoop is good but I see that you want in realtime
I was thinking of using sqoop in combination with oozie in a bundle triggered application.There are a couple of options to use event,time or action based event triggers.
There is a good blog on integratimg oozie with Oracle