Support Questions

suram1 · ‎12-01-2017

So my workflow for everyday purposes is,

1. I have data on hive that is refreshed by oozie periodically

2. I access this data from spark-shell and then build data sets/run ml algorithms and create new dataframes.

3. Then I am expected to persist these dataframes back to hive.

4. Then I can connect from Tableau through a JDBC connection and visualize the results.

I wish to skip the step between 3 and 4 and be able to connect directly from Tableau to my spark-shell where my dataframe is and then visualize these results from there. Is this possible with the spark-thrift server? I see a lot of restrictions on how this can be run and havent managed to get it running even once before.

Note: I am not admin on the HDP cluster so I dont have access to the keytabs etc that are needed for running hive server. I only wish to run as myself but trigger the creation of the dataset from tableau instead of going to hive but running a job that updates the hive table.

dhyun · ‎12-01-2017

Sorry, @Subramaniam Ramasubramanian. You cannot connect to your Spark-shell via JDBC.

View solution in original post

dhyun · ‎12-01-2017

Sorry, @Subramaniam Ramasubramanian. You cannot connect to your Spark-shell via JDBC.

suram1 · ‎12-04-2017

Appreciate the trouble to look into this @Dongjoon Hyun!

suram1 · ‎12-04-2017

Maybe just one follow up question. What exactly would be the usecase when you would want to use the spark thrift server then? Is it simply to have a spark backend to hive instead of lets say tez or mr?

dhyun · ‎12-04-2017

In addition to that, STS supports Spark SQL syntax since v2.0.0. If you want to use Spark SQL Syntax with SQL 2003 support, it's a good choice. Also, you can use Spark-specific syntax like `CACHE TABLE`, too.

Cloudera Community

Support Questions

Accessing spark dataframe in spark-shell through JDBC