Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Accessing spark dataframe in spark-shell through JDBC

avatar
Explorer

So my workflow for everyday purposes is,

1. I have data on hive that is refreshed by oozie periodically

2. I access this data from spark-shell and then build data sets/run ml algorithms and create new dataframes.

3. Then I am expected to persist these dataframes back to hive.

4. Then I can connect from Tableau through a JDBC connection and visualize the results.

I wish to skip the step between 3 and 4 and be able to connect directly from Tableau to my spark-shell where my dataframe is and then visualize these results from there. Is this possible with the spark-thrift server? I see a lot of restrictions on how this can be run and havent managed to get it running even once before.

Note: I am not admin on the HDP cluster so I dont have access to the keytabs etc that are needed for running hive server. I only wish to run as myself but trigger the creation of the dataset from tableau instead of going to hive but running a job that updates the hive table.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Sorry, @Subramaniam Ramasubramanian. You cannot connect to your Spark-shell via JDBC.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Sorry, @Subramaniam Ramasubramanian. You cannot connect to your Spark-shell via JDBC.

avatar
Explorer

Appreciate the trouble to look into this @Dongjoon Hyun!

avatar
Explorer

Maybe just one follow up question. What exactly would be the usecase when you would want to use the spark thrift server then? Is it simply to have a spark backend to hive instead of lets say tez or mr?

avatar
Expert Contributor

In addition to that, STS supports Spark SQL syntax since v2.0.0. If you want to use Spark SQL Syntax with SQL 2003 support, it's a good choice. Also, you can use Spark-specific syntax like `CACHE TABLE`, too.