Created 03-27-2018 03:10 PM
I have a requirement to read oracle db log file and write that data into Hbase, can I do this with kafka alone or anyother tools required for this and my architecture should be reliable, falut tolerant and scalable for large data processing.
Created 03-27-2018 03:34 PM
No, it is not enough to do it only with Kafka and HBase. Kafka is your transport layer and HBase is your target data store. You need few more components to connect to the source, post to Kafka, post to HBase.
In order to read Oracle DB log file you need a tool capable to perform Change Data Capture (CDC) from Oracle DB logs. That tool then would write to a Kafka topic. That is your "Kafka Producer" application. Then you would need to write an application that will read from Kafka topic and put the data to HBase. That is your "Kafka Consumer" application.
Example of CDC capable tools are GoldenGate, SharePlex, Attunity etc.
If you need a tool that will be used enterprise wide to connect to various source types, e.g. Oracle, SQL Server, MySQL, etc. and access database logs instead of issuing expensive queries on source databases, then Attunity is probably your best bet. However, if you don't plan to acquire and you already have GoldenGate or SharePlex then use those. For example, SharePlex writes directly to Kafka. Another option with Oracle would be to use its Change Data Capture feature (https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm) and then write that Kafka Producer application to gather the data from the source and write to Kafka topic. Then have your consumer application pick-up the data and put to HBase.
Apache NiFi will add this year a CDC processor for Oracle. Currently, NiFi has only the MySQL CDC processor.
If you want to make your life easier, use Apache NiFi (part of Hortonworks DataFlow) to implement Kafka Producer, Kafka Consumer, write to HBase. I see that you tagged your question with kafka-streams. You probably assume writing that Kafka Producer and Consumer using Kafka Stream, That is an alternate option to NiFi, but it will require more programming and it will require you to deal with HA and security aspects, while NiFi provides them out of box and developing a NiFi flow is much easier. NiFi has also Registry component which allows you to manage versions of the flows like source code. Hortonworks Schema Registry provides you with that structure that allows your Kafka producer and consumer applications to share schemas.
If this response helped, please vote and accept it as the best answer, if appropriate.
Created 03-27-2018 03:34 PM
No, it is not enough to do it only with Kafka and HBase. Kafka is your transport layer and HBase is your target data store. You need few more components to connect to the source, post to Kafka, post to HBase.
In order to read Oracle DB log file you need a tool capable to perform Change Data Capture (CDC) from Oracle DB logs. That tool then would write to a Kafka topic. That is your "Kafka Producer" application. Then you would need to write an application that will read from Kafka topic and put the data to HBase. That is your "Kafka Consumer" application.
Example of CDC capable tools are GoldenGate, SharePlex, Attunity etc.
If you need a tool that will be used enterprise wide to connect to various source types, e.g. Oracle, SQL Server, MySQL, etc. and access database logs instead of issuing expensive queries on source databases, then Attunity is probably your best bet. However, if you don't plan to acquire and you already have GoldenGate or SharePlex then use those. For example, SharePlex writes directly to Kafka. Another option with Oracle would be to use its Change Data Capture feature (https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm) and then write that Kafka Producer application to gather the data from the source and write to Kafka topic. Then have your consumer application pick-up the data and put to HBase.
Apache NiFi will add this year a CDC processor for Oracle. Currently, NiFi has only the MySQL CDC processor.
If you want to make your life easier, use Apache NiFi (part of Hortonworks DataFlow) to implement Kafka Producer, Kafka Consumer, write to HBase. I see that you tagged your question with kafka-streams. You probably assume writing that Kafka Producer and Consumer using Kafka Stream, That is an alternate option to NiFi, but it will require more programming and it will require you to deal with HA and security aspects, while NiFi provides them out of box and developing a NiFi flow is much easier. NiFi has also Registry component which allows you to manage versions of the flows like source code. Hortonworks Schema Registry provides you with that structure that allows your Kafka producer and consumer applications to share schemas.
If this response helped, please vote and accept it as the best answer, if appropriate.
Created 03-27-2018 05:01 PM
Thanks a lot!! very useful..