Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Real time streaming using Oracle logs

Real time streaming using Oracle logs

Rising Star

Hi,

Need help in getting some solution on the following

I want to read Oracle database archive logs and someway ingest into flume or some other tool and process the data (changes in the tables as per the logs ) further using spark or store it in hive tables.

The latter part is yet to be finalised but I am currently.looking for ingestion part.

I know there is golden gate but that's not feasibl for us owing to license cost.

So any open source tool for this please?

Thanks

Rishit shah

10 REPLIES 10

Re: Real time streaming using Oracle logs

Guru

NiFi is perfect for this -- this is one of its most common use cases. Your use case would take about 5-10 minutes to develop and deploy using NiFi.

NiFi is much easier to develop than Flume and has a much greater capability set. It is UI- and configuration-oriented, so you can rapidly build, deploy and monitor flows. And there is a lot of Quality of Service features behind it. It is also enterprise-ready in terms of security, governance and multitenancy.

NiFi has a processor that tails a log file (sends new lines to flow at configured polling interval) and another processor that puts to HDFS or streams to Hive. You can also place a processor in between to do transformations or smart routing (send to one target if content has this text and send to another target of it has that text). It is usually a best practice not to transform, however, but to store all ingested data in HDP as raw so you can leverage the data for future use cases.

NiFi is part of HDF. NiFi and HDF are open source. HDF/NiFi is deployed on its own cluster and does not require HDP (Hadoop) but it is a very common integration.

These links can help you get started:

Re: Real time streaming using Oracle logs

Rising Star

Thanks @Greg Keys

i will read the docs and let you know.

Re: Real time streaming using Oracle logs

Guru

You can also take a look at the first part of this post -- it pretty much shows what you are attempting (but maybe you will not need the middle 3 processors and you would stream to Hive using PutHiveStreaming):

Re: Real time streaming using Oracle logs

Rising Star

thanks @Greg Keys

i am not able to find out how would i be able to get oracle logs using Nifi?

thanks,

RIshit SHah

Re: Real time streaming using Oracle logs

Guru

You can either install nifi on the server generating the logs and communicate to a central nifi instance using Remote Process Group.

Better yet though is to implement the tailfile as Minifi on the edge and send the data to a central nifi instance. This is the preferred way: Minifi is a lightweight processor deployment on the edge that is designed only for collecting data and sending from edge to nifi flow for processing.

See: http://hortonworks.com/blog/edge-intelligence-iot-apache-minifi/ (especially slideshare at bottom)

Re: Real time streaming using Oracle logs

Rising Star

thanks @Greg Keys. i will check and update you in few days.

thanks,

Rishit Shah

Re: Real time streaming using Oracle logs

Rising Star

Re: Real time streaming using Oracle logs

New Contributor

I think it is necessary to clarify that he is talking about Oracles Transaction and Redo logs which are stored in a binary format, not a log file you can easily tail (such as syslog or cron logs). If anyone has a useful solution to this, that would be awesome! CDC has been the bane of many a data engineer.

Re: Real time streaming using Oracle logs

New Contributor

Jon, you are right in clarifying this and it's been some time after your comments. Did you or someone else figured out the real solution to do CDC with Oracle using redo logs or any other open source approach. It will be great to have that solution.