Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Real time/live Visualization of HDFS data

Highlighted

Real time/live Visualization of HDFS data

New Contributor

Hi,

 

I am using Flume to ingest real time data into HDFS. The stored data can be analyzed in the terminal using Spark.

 

But what I actually want is some sort of real time visualization/graphs of this data. Since Flume is continuously  (or let's say every 5 seconds) ingesting new data into HDFS, I would like my visualization to update automatically in real time or near real time by extracting the data stored in HDFS.

 

I have visualized the data by making Hive tables but that actually generates the table based on what is stored in a particular file at that moment and does not updated automatically if new data arrives. In other words, it is not real time visualization.

 

Is there a way I can achieve the real time visualizations i.e. pie charts, bar graphs etc for my data that is being ingested into HDFS?

 

As a second part of this question, Flume has a rool over time of 30 seconds which means it creates a new file every 30 seconds, I would want the visualizations to read the data stored in all the files in a particular directory rather than reading it from a particular file to generate the visualization.

 

Thanks,

-Riz

 

1 REPLY 1

Re: Real time/live Visualization of HDFS data

Expert Contributor

Hi Rizi,

 

Spark has a spark streaming module, you could look into the Spark Flume integration or have Flume write to Kafka which would be a better itegration for real time data than writing to HDFS.  Spark could then be used to update the charts to get a visiualization.  There are many ways for spark to update a graph and you would want to search google to find a method suitable to you.

 

Thanks,
Jason