Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Real time/live Visualization of HDFS data

Real time/live Visualization of HDFS data

New Contributor



I am using Flume to ingest real time data into HDFS. The stored data can be analyzed in the terminal using Spark.


But what I actually want is some sort of real time visualization/graphs of this data. Since Flume is continuously  (or let's say every 5 seconds) ingesting new data into HDFS, I would like my visualization to update automatically in real time or near real time by extracting the data stored in HDFS.


I have visualized the data by making Hive tables but that actually generates the table based on what is stored in a particular file at that moment and does not updated automatically if new data arrives. In other words, it is not real time visualization.


Is there a way I can achieve the real time visualizations i.e. pie charts, bar graphs etc for my data that is being ingested into HDFS?


As a second part of this question, Flume has a rool over time of 30 seconds which means it creates a new file every 30 seconds, I would want the visualizations to read the data stored in all the files in a particular directory rather than reading it from a particular file to generate the visualization.






Re: Real time/live Visualization of HDFS data

Expert Contributor

Hi Rizi,


Spark has a spark streaming module, you could look into the Spark Flume integration or have Flume write to Kafka which would be a better itegration for real time data than writing to HDFS.  Spark could then be used to update the charts to get a visiualization.  There are many ways for spark to update a graph and you would want to search google to find a method suitable to you.