Support Questions

ipson · ‎10-16-2024

I have a CML project using a JupyterLab Runtime with Python 3.10 and I want to start a Spark cluster with my CDP Datalake. I'm using the predefined Spark Data Lake Connection in CML which looks like this:

```

import cml.data_v1 as cmldata

# Sample in-code customization of spark configurations
#from pyspark import SparkContext
#SparkContext.setSystemProperty('spark.executor.cores', '1')
#SparkContext.setSystemProperty('spark.executor.memory', '2g')

CONNECTION_NAME = "hiaa-dl"
conn = cmldata.get_connection(CONNECTION_NAME)
spark = conn.get_spark_session()

# Sample usage to run query through spark
EXAMPLE_SQL_QUERY = "show databases"
spark.sql(EXAMPLE_SQL_QUERY).show()

```

When I execute this I get the error:

IllegalArgumentException: The value of property spark.app.name must not be null

I'm using the predefined spark-defaults.conf which looks like this:

```

spark.executor.memory=1g
spark.executor.cores=1
spark.yarn.access.hadoopFileSystems=abfs://[container]@[storage-account].dfs.core.windows.net

```

Is there something else I need to configure in the CML session or at the data lake level?

ipson · ‎10-16-2024

Resolved. I had ML Runtimes Addons disabled. Went into CML > Site Administrations > Settings and Under Feature Flags, unchecked the checkbox next to Allow users to Run ML Runtimes Addons.

Then, started a new session with Spark enabled

View solution in original post

ipson · ‎10-16-2024

Resolved. I had ML Runtimes Addons disabled. Went into CML > Site Administrations > Settings and Under Feature Flags, unchecked the checkbox next to Allow users to Run ML Runtimes Addons.

Then, started a new session with Spark enabled

ipson · ‎10-17-2024

Correction: 'Check the checkbox to Allow users to Run ML Runtimes'

Cloudera Community

Support Questions

Issues running Spark in CML