I am using Flume to ingest real time data into HDFS. The stored data can be analyzed in the terminal using Spark.
But what I actually want is some sort of real time visualization/graphs of this data. Since Flume is continuously (or let's say every 5 seconds) ingesting new data into HDFS, I would like my visualization to update automatically in real time or near real time by extracting the data stored in HDFS.
I have visualized the data by making Hive tables but that actually generates the table based on what is stored in a particular file at that moment and does not updated automatically if new data arrives. In other words, it is not real time visualization.
Is there a way I can achieve the real time visualizations i.e. pie charts, bar graphs etc for my data that is being ingested into HDFS?
As a second part of this question, Flume has a rool over time of 30 seconds which means it creates a new file every 30 seconds, I would want the visualizations to read the data stored in all the files in a particular directory rather than reading it from a particular file to generate the visualization.
Spark has a spark streaming module, you could look into the Spark Flume integration or have Flume write to Kafka which would be a better itegration for real time data than writing to HDFS. Spark could then be used to update the charts to get a visiualization. There are many ways for spark to update a graph and you would want to search google to find a method suitable to you.