Community Articles

subratadas · ‎04-12-2021

Introduction

This post explains the process of submitting Spark jobs on CDP Public Cloud Data Hub to the Livy server and provides a sample wrapper script for job submission. To understand the Livy Spark submit process, please refer to this post and understand how to arrive at the job configuration.

The following steps submit the Spark Pi job from the spark_examples*.jar on S3.

Steps

Get the Livy endpoint for Data hub from the CDP control plane
Ensure that you have the workload password set for the environment

Download the Python wrapper script using the following command:

wget https://raw.githubusercontent.com/karthikeyanvijay/cdp-publiccloud/master/datahub-scripts/livy-cdp-spark-submit/cdp_spark_submit.py

Edit the script to modify the Livy endpoint, workload username, and password

Copy the spark sample jar to the S3

hdfs dfs -cp /opt/cloudera/parcels/CDH-7.2.7-1.cdh7.2.7.p6.11615609/lib/spark/examples/jars/spark-examples_2.11-2.4.5.7.2.7.6-2.jar \
             s3a://vkarthikeyan/

Create a sample job configuration file as follows with the name jobconf.json

{
"className":"org.apache.spark.examples.SparkPi",
"args": [1000],
"file":"s3a://vkarthikeyan/spark-examples_2.11-2.4.5.7.2.7.6-2.jar",
"driverMemory": "2G",
"driverCores": 1,
"executorCores": 2,
"executorMemory": "4G",
"numExecutors": 3,
"queue": "default"
}

Run the script
```
./cdp_spark_submit.py
```

Conclusion

The job should now be submitted to the Data Hub cluster. The wrapper script can also be used on the CDP Private Cloud Base where Livy is configured with Livy.

------------

Vijay Anand Karthikeyan

yagoaparecidoti · ‎03-21-2025

hi @vkarthikeyan

how to run the same test but with livy installed in the private cdp, with kerberos and tls enabled?

Cloudera Community

Community Articles

Submit Spark jobs to Livy on CDP Public Cloud Data Hub cluster

Cloudera Data Platform Private Cloud (CDP-Private)

Cloudera Enterprise Data Hub

Introduction

Steps

Conclusion

Re: Submit Spark jobs to Livy on CDP Public Cloud Data Hub cluster