Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is there any chance to use Spark 3 on CDH 6.x cluster?

avatar
Contributor

My manager forces me to find a way to install and use Spark 3 on CDH 6.x cluster. Is there any change?

When I did some research, I found out that only CDP 7. supports Spark 3, and CDH 6.x only support Spark 2. But my manager said that you don't need to install Spark through Cloudera Manager, you can install Spark 3 separately (by downloading a tar from the internet or sth like that) and then find a way to make that Spark service connect with Cloudera service like Hive, HDFS,... (by copying the hive-site, hdfs-site,... to spark conf folder maybe?)

 

So does anyone have any experience with this? My manager is insane!!!!

 

1 ACCEPTED SOLUTION

avatar
Contributor

I've successfully setup Spark 3.3.0 on CDH 6.2 (we used YARN). Here are some important step

1. Back up the current spark come from Cloudera package (v2.4.0 I think) at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark

2. Download the spark version from Spark homepage, for ex "spark-3.3.0-bin-hadoop3.tgz". Extract, delete old spark folder and replace with new spark folder (rename it to "spark") at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark

3. Copy all the config files from old spark conf folder to the new spark conf folder

quangbilly79_0-1686292386444.png

 

4. Copy the Yarn-related config file into spark conf folder too

quangbilly79_1-1686292423641.png

4.1. Copy file spark-3.3.0-yarn-shuffle.jar from spark/yarn to spark/jars folder

5. Make some modifications to spark-default.conf file, mostly disable log and point to the right jar folder

quangbilly79_2-1686292516879.png

6. Modify some yarn config like below (yarn-site.xml)

quangbilly79_3-1686292576948.pngquangbilly79_4-1686292594363.png

 

 

7. Restart the cluster and run spark-shell command. Run some queries for testing. You could modify the yarn-site.xml file in the spark conf folder directly to make sure.

 

View solution in original post

1 REPLY 1

avatar
Contributor

I've successfully setup Spark 3.3.0 on CDH 6.2 (we used YARN). Here are some important step

1. Back up the current spark come from Cloudera package (v2.4.0 I think) at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark

2. Download the spark version from Spark homepage, for ex "spark-3.3.0-bin-hadoop3.tgz". Extract, delete old spark folder and replace with new spark folder (rename it to "spark") at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark

3. Copy all the config files from old spark conf folder to the new spark conf folder

quangbilly79_0-1686292386444.png

 

4. Copy the Yarn-related config file into spark conf folder too

quangbilly79_1-1686292423641.png

4.1. Copy file spark-3.3.0-yarn-shuffle.jar from spark/yarn to spark/jars folder

5. Make some modifications to spark-default.conf file, mostly disable log and point to the right jar folder

quangbilly79_2-1686292516879.png

6. Modify some yarn config like below (yarn-site.xml)

quangbilly79_3-1686292576948.pngquangbilly79_4-1686292594363.png

 

 

7. Restart the cluster and run spark-shell command. Run some queries for testing. You could modify the yarn-site.xml file in the spark conf folder directly to make sure.