Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Extracting transaction and non-transaction RDBMS data to Kerberized Hadoop

Extracting transaction and non-transaction RDBMS data to Kerberized Hadoop


Hello Guys,

I'm planning to have an exercise focused around extracting the data from RDBMS for both transactional and non-transactional data like dictonaries. Transactional data is suppose to be joined with non-transactional and placed on kerberized Hadoop cluster. I played a bit with both Kafka and NiFi. Kafka seems to have a nice jdbc connector, but i had big issues while trying to connect to RDBMS. Moreover, once I did, Kafka had huge issues trying to do the first load of the entire table, eventually failing to do so. NiFi is also a nice option, but in order to get transactions, I had to use a separate processors to get max_id first, which in case of hadoop, is time consuming. In case of both tools, I might be lacking some knowledge hence the issues I have faced, so I'd appreciate the advice on how could I overcome them. Also, if there are more interesting alternatives, I'd love to hear some recommendations. I'm looking for 5 min delay in getting the data at most.

Thanks in advance. Cheers!

Don't have an account?
Coming from Hortonworks? Activate your account here