Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive On Spark

Hive On Spark

New Contributor

Hi ALL

I am following steps on http://www.cloudera.com/documentation/enterprise/5-5-x/topics/admin_hos_config.html#concept_mb4_g2w_...

for enabling Hive on Spark. I am just starting and have my first question.

 

it says "For Hive to work on Spark, you must deploy Spark gateway roles on the same machine that hosts HiveServer2. Otherwise, Hive on Spark cannot read from Spark configurations and cannot submit Spark jobs. " . I dont think I understand this requirement completly

 

I have 1+3 node cluster, where Spark Gateway is running on all nodes while HiveServer2 is only running on master node. 

 

Do I need HiveServer2 running on all four nodes?

if no, does this mean I can only submit jobs through node 1 / master since HiveServer2 exists only on node 1/master?

 

thanks

Ankit

1 REPLY 1
Highlighted

Re: Hive On Spark

Cloudera Employee

Hi Ankit,

 

I'll try to shed some light on the mystical art of using HoS (Hive on Spark).

 

First of all, no, you don't need to install HS2 (HiveServer2) on all nodes in the cluster. Having Spark Gateway role on all nodes is a good solution, the docs just want to make sure you have one on the same node the HS2 is running.

 

As for the other question, that's also a negative. HS2 user server-client architecture, so basically you can run the client (beeline) on any node in the cluster and connect to HS2 to submit the query for execution. The job (query) will then get executed on random nodes in the cluster.