Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Using Kafka to extract data. How to aggregate it every week on HDFS?

Explorer

Hello All.

I am currently working on a procedure, that is responsible for transactional data extraction from our MSSQL database using Kafka. It does so every few seconds and the over workload is quite small, something around 1000 transactions every 10 seconds.

This very data (in raw form) is placed on HDFS and later visualized. Later on, I'd like to do weekly summaries without the need of aggregating the data in visualization tool, but rather to have it already calculated on HDFS. In general, what would you recommend to use in such a case? The task is simple: Every week gather the data that's stored in HDFS, aggreagate it and put in a designated 'Weekly_Summary' table.

Thanks in advance for all the answers!

1 REPLY 1

@P B You could use oozie + coordinator to schedule and run jobs in your hadoop cluster.

https://hortonworks.com/apache/oozie/

HTH

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.