Created 10-16-2024 07:20 AM
I have a CML project using a JupyterLab Runtime with Python 3.10 and I want to start a Spark cluster with my CDP Datalake. I'm using the predefined Spark Data Lake Connection in CML which looks like this:
```
import cml.data_v1 as cmldata
# Sample in-code customization of spark configurations
#from pyspark import SparkContext
#SparkContext.setSystemProperty('spark.executor.cores', '1')
#SparkContext.setSystemProperty('spark.executor.memory', '2g')
CONNECTION_NAME = "hiaa-dl"
conn = cmldata.get_connection(CONNECTION_NAME)
spark = conn.get_spark_session()
# Sample usage to run query through spark
EXAMPLE_SQL_QUERY = "show databases"
spark.sql(EXAMPLE_SQL_QUERY).show()
```
When I execute this I get the error:
IllegalArgumentException: The value of property spark.app.name must not be null
I'm using the predefined spark-defaults.conf which looks like this:
```
spark.executor.memory=1g
spark.executor.cores=1
spark.yarn.access.hadoopFileSystems=abfs://[container]@[storage-account].dfs.core.windows.net
```
Is there something else I need to configure in the CML session or at the data lake level?
Created 10-16-2024 12:10 PM
Resolved. I had ML Runtimes Addons disabled. Went into CML > Site Administrations > Settings and Under Feature Flags, unchecked the checkbox next to Allow users to Run ML Runtimes Addons.
Then, started a new session with Spark enabled
Created 10-16-2024 12:10 PM
Resolved. I had ML Runtimes Addons disabled. Went into CML > Site Administrations > Settings and Under Feature Flags, unchecked the checkbox next to Allow users to Run ML Runtimes Addons.
Then, started a new session with Spark enabled
Created 10-17-2024 04:56 AM
Correction: 'Check the checkbox to Allow users to Run ML Runtimes'