I am currently working on a procedure, that is responsible for transactional data extraction from our MSSQL database using Kafka. It does so every few seconds and the over workload is quite small, something around 1000 transactions every 10 seconds.
This very data (in raw form) is placed on HDFS and later visualized. Later on, I'd like to do weekly summaries without the need of aggregating the data in visualization tool, but rather to have it already calculated on HDFS. In general, what would you recommend to use in such a case? The task is simple: Every week gather the data that's stored in HDFS, aggreagate it and put in a designated 'Weekly_Summary' table.