Support Questions
Find answers, ask questions, and share your expertise

How to allow unlimited clients to connect to Hive on Spark

The default execution engine was Map-Reduce for Hive on my cluster so far.

My Hive map reduce jobs started failing with the error discussed here. I have now switched my execution engine to Spark and it does not throw that error however,

We have constantly running Hive jobs scheduled through oozie throughout the day and also people use Hive from Hive editor in Hue.

In some of my jobs scheduled through oozie, I see this error

Error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. (state=42000,code=40000)

Here's my cluster configuration:

What can I do to accept any number of connections to spark? I cannot afford any of my jobs failing

Yarn containers are allowed a memory of 6GB and it has worked fine with Map Reduce.

Spark Executor Cores : 4

Spark Executor Maximum Java Heap Size: 2 GB

Spark Driver Memory Overhead:26 MiB

Spark Executor Memory Overhead: 26 MiB

My node configurations:

1 Master node with spark server on it: 16vCPU, 64GB memory

3 worker nodes with HDFS and YARN on it: 16vCPU, 64GB memory

What should be the values for above-mentioned parameters?

I am guessing it to be 6 executors and 25GB heap size with 7GB executor memory overhead

Correct me if I am wrong please

IP1IP13 Role(s)
  • HDFS Datanode
  • YARN(MR2 Included) NodeManager
  • Spark Gateway
6.6s ago504.1 GiB / 2 TiB11.9 GiB / 62.5 GiB
IP2IP23 Role(s)
  • HDFS Datanode
  • YARN(MR2 Included) NodeManager
  • Spark Gateway
6.71s ago494 GiB / 2 TiB10.5 GiB / 62.5 GiB
IP3IP36 Role(s)
  • Cloudera services and manager
  • YARN Node manager
7.41s ago1.1 TiB / 1.4 TiB10.7 GiB / 31 GiB
IP4IP415 Role(s)
  • HDFS Balancer
  • HDFS Datanode
  • HDFS NameNode
  • HDFS SNN
  • Hive Metastore
  • HiveServer2
  • Oozie Server
  • YARN JobHistoryServer
  • YARN Resource Manager
  • Zookeeper
  • Hue
  • Sentry
  • Spark gateway
  • Spark History server
  • Sqoop 2 server
6.53s ago1 TiB / 2.4 TiB52.9 GiB / 62.5 GiB
IP5IP52 Role(s)
  • HDFS Datanode
  • YARN(MR2 Included) NodeManager
  • Spark Gateway
14.07s ago2.8 GiB / 2 TiB998.7 MiB / 62.5 GiB
IP6IP67 Role(s)
All gateways
YARN Node manager
3.66s ago927.5 GiB / 1000 GiB6.7 GiB / 31 GiB
IP7IP72 Role(s)
  • HDFS Datanode
  • YARN(MR2 Included) NodeManager
4.44s ago

2 REPLIES 2

Mentor

@Sim kaur

You have YARN tuning spreadsheet for Cloudera. Filling out the excel automatically helps you calculate some of the parameters like Vcores and Memory.

Set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. Check 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'

YARN Container Memory might be smaller than the Spark Executor requirement

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.