Support Questions

Find answers, ask questions, and share your expertise

Issues running Spark in CML

avatar
Contributor

I have a CML project using a JupyterLab Runtime with Python 3.10 and I want to start a Spark cluster with my CDP Datalake. I'm using the predefined Spark Data Lake Connection in CML which looks like this:

```

import cml.data_v1 as cmldata

# Sample in-code customization of spark configurations
#from pyspark import SparkContext
#SparkContext.setSystemProperty('spark.executor.cores', '1')
#SparkContext.setSystemProperty('spark.executor.memory', '2g')

CONNECTION_NAME = "hiaa-dl"
conn = cmldata.get_connection(CONNECTION_NAME)
spark = conn.get_spark_session()

# Sample usage to run query through spark
EXAMPLE_SQL_QUERY = "show databases"
spark.sql(EXAMPLE_SQL_QUERY).show()

``` 

When I execute this I get the error:

IllegalArgumentException: The value of property spark.app.name must not be null

I'm using the predefined spark-defaults.conf which looks like this:

```

spark.executor.memory=1g
spark.executor.cores=1
spark.yarn.access.hadoopFileSystems=abfs://[container]@[storage-account].dfs.core.windows.net

```

Is there something else I need to configure in the CML session or at the data lake level?

1 ACCEPTED SOLUTION

avatar
Contributor

Resolved. I had ML Runtimes Addons disabled. Went into CML > Site Administrations > Settings and Under Feature Flags, unchecked the checkbox next to Allow users to Run ML Runtimes Addons.

Then, started a new session with Spark enabled

View solution in original post

2 REPLIES 2

avatar
Contributor

Resolved. I had ML Runtimes Addons disabled. Went into CML > Site Administrations > Settings and Under Feature Flags, unchecked the checkbox next to Allow users to Run ML Runtimes Addons.

Then, started a new session with Spark enabled

avatar
Contributor

Correction: 'Check the checkbox to Allow users to Run ML Runtimes'