Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using Kafka to extract data. How to aggregate it every week on HDFS?

Using Kafka to extract data. How to aggregate it every week on HDFS?

Explorer

Hello All.

I am currently working on a procedure, that is responsible for transactional data extraction from our MSSQL database using Kafka. It does so every few seconds and the over workload is quite small, something around 1000 transactions every 10 seconds.

This very data (in raw form) is placed on HDFS and later visualized. Later on, I'd like to do weekly summaries without the need of aggregating the data in visualization tool, but rather to have it already calculated on HDFS. In general, what would you recommend to use in such a case? The task is simple: Every week gather the data that's stored in HDFS, aggreagate it and put in a designated 'Weekly_Summary' table.

Thanks in advance for all the answers!

1 REPLY 1
Highlighted

Re: Using Kafka to extract data. How to aggregate it every week on HDFS?

@P B You could use oozie + coordinator to schedule and run jobs in your hadoop cluster.

https://hortonworks.com/apache/oozie/

HTH

Don't have an account?
Coming from Hortonworks? Activate your account here