Real time/live Visualization of HDFS data



I am using Flume to ingest real time data into HDFS. The stored data can be analyzed in the terminal using Spark.


But what I actually want is some sort of real time visualization/graphs of this data. Since Flume is continuously  (or let's say every 5 seconds) ingesting new data into HDFS, I would like my visualization to update automatically in real time or near real time by extracting the data stored in HDFS.


I have visualized the data by making Hive tables but that actually generates the table based on what is stored in a particular file at that moment and does not updated automatically if new data arrives. In other words, it is not real time visualization.


Is there a way I can achieve the real time visualizations i.e. pie charts, bar graphs etc for my data that is being ingested into HDFS?


As a second part of this question, Flume has a rool over time of 30 seconds which means it creates a new file every 30 seconds, I would want the visualizations to read the data stored in all the files in a particular directory rather than reading it from a particular file to generate the visualization.





Re: Real time/live Visualization of HDFS data

Hi Rizi,


Spark has a spark streaming module, you could look into the Spark Flume integration or have Flume write to Kafka which would be a better itegration for real time data than writing to HDFS.  Spark could then be used to update the charts to get a visiualization.  There are many ways for spark to update a graph and you would want to search google to find a method suitable to you.