There is a request to add Spark Thrift Server https://issues.cloudera.org/browse/DISTRO-817
please vote up if you want to see that in CDH.
I moved from CDH 5.4.0 to 5.5.0, but I cannot see the $SPARK_HOME/sbin/start-thriftserver.sh. I have a use case that uses the Spark's Thrift server exposing Hive tables to my Tableau visualization. Is the script located elsewhere? If not, how do I make a workaround for this?
You can find the scripts here:
However, the Spark that ships with CDH 5.5 does not include the Spark Thriftserver.
Take a look at this post from Clairvoyant to learn how to build from source:
The thrift server in Spark is not tested, and might not be compatible, with the Hive version that is in CDH.
Hive in CDH is 1.1 (patched) and Spark uses Hive 1.2.1. You might see API issues during compilation or run time failures due to that.
How can one build a Spark release that includes the thrift server and links with the patched version in CDH?
Mr. Arnold hope your doing well...
I went through this on MapR only issues I had were running in secure mode but that might be only MapR but if you run into issues:
I have finally managed to post instructions on how I am rebuilding Cloudera's Spark to include the thriftserver. The summary is that you would:
git clone https://github.com/cloudera/spark.git cd spark ./make-distribution.sh -DskipTests \ -Dhadoop.version=2.6.0-cdh5.7.0 \ -Phadoop-2.6 \ -Pyarn \ -Phive -Phive-thriftserver \ -Pflume-provided \ -Phadoop-provided \ -Phbase-provided \ -Phive-provided \ -Pparquet-provided
The post goes into all the details as well as provides a handy Vagrant environment in which to perform the build.
Thanks. Voted also for this feature. Not only Tableau, but Excel, and other apps would have gain from this feature.
But Isn't some workaround to use HiveOnSpark?
From CDH 5.7 it is enabled and works like charm.