Created 12-01-2017 12:23 PM
So my workflow for everyday purposes is,
1. I have data on hive that is refreshed by oozie periodically
2. I access this data from spark-shell and then build data sets/run ml algorithms and create new dataframes.
3. Then I am expected to persist these dataframes back to hive.
4. Then I can connect from Tableau through a JDBC connection and visualize the results.
I wish to skip the step between 3 and 4 and be able to connect directly from Tableau to my spark-shell where my dataframe is and then visualize these results from there. Is this possible with the spark-thrift server? I see a lot of restrictions on how this can be run and havent managed to get it running even once before.
Note: I am not admin on the HDP cluster so I dont have access to the keytabs etc that are needed for running hive server. I only wish to run as myself but trigger the creation of the dataset from tableau instead of going to hive but running a job that updates the hive table.
Created 12-01-2017 06:35 PM
Sorry, @Subramaniam Ramasubramanian. You cannot connect to your Spark-shell via JDBC.
Created 12-01-2017 06:35 PM
Sorry, @Subramaniam Ramasubramanian. You cannot connect to your Spark-shell via JDBC.
Created 12-04-2017 11:11 AM
Appreciate the trouble to look into this @Dongjoon Hyun!
Created 12-04-2017 11:16 AM
Maybe just one follow up question. What exactly would be the usecase when you would want to use the spark thrift server then? Is it simply to have a spark backend to hive instead of lets say tez or mr?
Created 12-04-2017 05:12 PM
In addition to that, STS supports Spark SQL syntax since v2.0.0. If you want to use Spark SQL Syntax with SQL 2003 support, it's a good choice. Also, you can use Spark-specific syntax like `CACHE TABLE`, too.