Support Questions

Hae · ‎03-13-2024

I have tried this code on jupyter of Cloudera Machine Learning.

============================================================

import cml.data_v1 as cmldata

from pyspark import SparkContext

#Optional Spark Configs
SparkContext.setSystemProperty('spark.executor.cores', '4')
SparkContext.setSystemProperty('spark.executor.memory', '8g')

#Boilerplate Code provided to you by CML Data Connections
CONNECTION_NAME = "go01-dl"
conn = cmldata.get_connection(CONNECTION_NAME)
spark = conn.get_spark_session()

# Sample usage to run query through spark
EXAMPLE_SQL_QUERY = "show databases"
spark.sql(EXAMPLE_SQL_QUERY).show()

=================================================

I cheched enable spark, as well when I tried to create session

However the result was failed with error message

'No data connection named go01-dl found'

While I am trying this, I thought I have to get information of spark.

BUT I CANNOT.

Where Can I get the connection name of SPARK?

What should I do to do run job on Spark of CML?

Please let me know

There is no 'Data' tab on CML session.

There is no 'Data connectors' on Project setting and Site Administration of CML, as well.

smdas · ‎04-15-2024

Hello @Hae

AFAIK, Data Connections is a Public Cloud Concept & isn't available in Private Cloud yet. In Public Cloud, [1] shows the Steps to configure Data Connections, which allows you to access the HMS of the DataLake (Unified HMS Source For The Environment). In Private Cloud, You may use the [2] to use Spark on CML. The same has Example on using Spark-On-Yarn on Base Cluster as well as Spark-On-Kubernetes on CML.

- Smarak

[1] https://docs.cloudera.com/machine-learning/cloud/mlde/topics/ml-mlde-spark-data-connection.html

[2] https://docs.cloudera.com/machine-learning/1.5.2/spark/topics/ml-apache-spark-overview.html

View solution in original post

Surya_Sarikonda · ‎03-14-2024

Hi @Hae
Please share the CDP (Public/Private) / CML versions to understand this better. Ideally, your data connections will be chosen automatically if you have any data setup. If not try the option to create a new connection, add your datalake, and warehouse connections. Sync with workspace.

Refer to https://blog.cloudera.com/one-line-away-from-your-data/ to understand the setup.

Hae · ‎03-14-2024

Version of CM is 7.11.3 and runtime is 7.1.9-1.cdh7.1.9.p3.48381316

And version ov ECS is 1.5.2-b886-ecs-1.5.2-b886.p0.46792599

CML version is 2.0.42-b80

Hae · ‎03-18-2024