Support Questions

Find answers, ask questions, and share your expertise

How to use external Spark with the Cloudera cluster?

avatar
Expert Contributor

hi cloudera, 

I need to use Spark on a host that is not part of the Cloudera cluster to run Spark jobs on the Cloudera cluster.

Is it possible to use it this way? If yes, how to configure?

what I've already tried:

1. Download "https://www.apache.org/dyn/closer.lua/spark/spark-3.3.4/spark-3.3.4-bin-hadoop3.tgz"
2. Copy the "conf" files from the Cloudera cluster and send them to the new Spark directory
3. exported the variables "HADOOP_CONF_DIR" and "SPARK_CONF_DIR" and "SPARK_HOME" using the new spark directory "spark-3.3.4-bin-hadoop3" with the files
4. When trying to run spark-shell as an example, nothing happens, it hangs as shown below:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.4
/_/

Using Scala version 2.13.8 (Java HotSpot(TM) 64-Bit Server VM, Java 11.0.16.1)
Type in expressions to have them evaluated.
Type :help for more information.

note: the cluster has kerberos, so before running spark-shell, kinit was run

2 ACCEPTED SOLUTIONS

avatar
Expert Contributor

Unfortunately, as I didn't receive feedback from the community to give me guidance, I had to rack my brains a lot, hours and hours of testing, but I managed to do what I wanted.

I downloaded Spark in the same version as cdh 6.3.4, I configured the spark configuration files with the information from cdh 6.3.4, so when calling "spark-submit" the job is executed in the cdh cluster

View solution in original post

avatar
Master Collaborator

Hi @yagoaparecidoti 

Unfortunately Cloudera will not support installing/using the open source Spark because of some customisations needs to be done at Cloudera end support other component integrations. 

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Unfortunately, as I didn't receive feedback from the community to give me guidance, I had to rack my brains a lot, hours and hours of testing, but I managed to do what I wanted.

I downloaded Spark in the same version as cdh 6.3.4, I configured the spark configuration files with the information from cdh 6.3.4, so when calling "spark-submit" the job is executed in the cdh cluster

avatar
Master Collaborator

Hi @yagoaparecidoti 

Unfortunately Cloudera will not support installing/using the open source Spark because of some customisations needs to be done at Cloudera end support other component integrations.