Support Questions

yagoaparecidoti · ‎01-23-2024

hi cloudera,

I need to use Spark on a host that is not part of the Cloudera cluster to run Spark jobs on the Cloudera cluster.

Is it possible to use it this way? If yes, how to configure?

what I've already tried:

1. Download "https://www.apache.org/dyn/closer.lua/spark/spark-3.3.4/spark-3.3.4-bin-hadoop3.tgz"
2. Copy the "conf" files from the Cloudera cluster and send them to the new Spark directory
3. exported the variables "HADOOP_CONF_DIR" and "SPARK_CONF_DIR" and "SPARK_HOME" using the new spark directory "spark-3.3.4-bin-hadoop3" with the files
4. When trying to run spark-shell as an example, nothing happens, it hangs as shown below:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.4
/_/

Using Scala version 2.13.8 (Java HotSpot(TM) 64-Bit Server VM, Java 11.0.16.1)
Type in expressions to have them evaluated.
Type :help for more information.

note: the cluster has kerberos, so before running spark-shell, kinit was run

yagoaparecidoti · ‎02-02-2024

Unfortunately, as I didn't receive feedback from the community to give me guidance, I had to rack my brains a lot, hours and hours of testing, but I managed to do what I wanted.

I downloaded Spark in the same version as cdh 6.3.4, I configured the spark configuration files with the information from cdh 6.3.4, so when calling "spark-submit" the job is executed in the cdh cluster

View solution in original post

RangaReddy · ‎02-04-2024

Hi @yagoaparecidoti

Unfortunately Cloudera will not support installing/using the open source Spark because of some customisations needs to be done at Cloudera end support other component integrations.

View solution in original post

yagoaparecidoti · ‎02-02-2024

Unfortunately, as I didn't receive feedback from the community to give me guidance, I had to rack my brains a lot, hours and hours of testing, but I managed to do what I wanted.

I downloaded Spark in the same version as cdh 6.3.4, I configured the spark configuration files with the information from cdh 6.3.4, so when calling "spark-submit" the job is executed in the cdh cluster

RangaReddy · ‎02-04-2024

Hi @yagoaparecidoti

Unfortunately Cloudera will not support installing/using the open source Spark because of some customisations needs to be done at Cloudera end support other component integrations.

Cloudera Community

Support Questions

How to use external Spark with the Cloudera cluster?

Spark in CML: Recommendations for using Spark in C...

Working with CDE Spark Job Parameters in Cloudera ...

Using Spark to Virtually Integrate Hadoop with Ext...

Cloudera Manager UI - External User Mappings API

Steps to install supplementary Spark on HDP cluste...

How to configure external accounts for Streams Rep...

Spark HiveContext - Querying External Hive Table

Support Video: How to configure log4j for Spark on...

How to parse XMLs in Cloudera Data Engineering wit...

Accessing the remote CDP cluster HBase data from a...