Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Use of Spark Streaming for interactive Reporting/Visualization

avatar

Hi,

Is SPARK Streaming (may be along with SPARKSQL) suited for interactive querying – to generate reporting dashboards using Tableau?

We are building a data lake with all our organization’s data in the data lake (as AVRO formatted files). We need to create dashboards & reports using Tableau with the data available in the data lake. The challenge is that some of these reports have to process millions of records and have strict timelines for loading (sometimes as strict as <10 seconds load time for reports). We are right now forced to create an Oracle datamart (populated with the data from the data lake) – from where Tableau pull data to generate reports.

We want to avoid creating a separate data mart and hence are looking at connecting Tableau directly to the Hadoop Datalake. While pure SPARK is ruled out as the Reports need to be interactive, go to know (from yesterday’s Hortonworks webinar on SPARK) that SPARK Streaming can be used here. Is SPARK Streaming (may be along with SPARKSQL) suited for interactive querying – to generate reporting dashboards using Tableau?

Are there any similar example use cases that you can point me to please?

thanks, Raga

1 ACCEPTED SOLUTION

avatar

Actually, many BI vendors including Tableau have announced a Spark Connector over JDBC, which should presumably be able to leverage data loaded into RDD's in memory. If you load data via Spark Streaming into RDD, then either schematize it (rdd.registerTempTable) or convert to DataFrame (rdd.toDF), you should be able to query that data from a JDBC connection and display in dashboard.

Here is info on Tableau connector, including a video at bottom of page:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&so...

View solution in original post

2 REPLIES 2

avatar
Contributor

@Raghavendran Chellappa

Tableau or any other BI tool for that matter can't connect directly to Spark Streaming. Spark Streaming only processes the data -- you still need to persist it in HDFS or somewhere else before Tableau or anything else can connect to it.

In case you need to do interactive analysis with a very short SLA, you need a system which can index the data. Pure row scans won't cut it. One example would be to connect Spark Streaming to Solr. Solr will index the data as it is inserted. You can then build a read-only dashboard using Banana, or build a custom app which queries Solr for user-defined queries.

So the flow is:

Streaming Data -> Spark Streaming -> Solr -> Banana Dashboard (or a custom app if interactivity is desired)

Look here for an example of streaming Tweets from Spark into Solr:

https://doc.lucidworks.com/lucidworks-hdpsearch/2....

avatar

Actually, many BI vendors including Tableau have announced a Spark Connector over JDBC, which should presumably be able to leverage data loaded into RDD's in memory. If you load data via Spark Streaming into RDD, then either schematize it (rdd.registerTempTable) or convert to DataFrame (rdd.toDF), you should be able to query that data from a JDBC connection and display in dashboard.

Here is info on Tableau connector, including a video at bottom of page:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&so...