Support Questions

Find answers, ask questions, and share your expertise

How to use spark on CML session

avatar
Rising Star

I have tried this code on jupyter of Cloudera Machine Learning.

============================================================

import cml.data_v1 as cmldata

from pyspark import SparkContext

#Optional Spark Configs
SparkContext.setSystemProperty('spark.executor.cores', '4')
SparkContext.setSystemProperty('spark.executor.memory', '8g')

#Boilerplate Code provided to you by CML Data Connections
CONNECTION_NAME = "go01-dl"
conn = cmldata.get_connection(CONNECTION_NAME)
spark = conn.get_spark_session()

# Sample usage to run query through spark
EXAMPLE_SQL_QUERY = "show databases"
spark.sql(EXAMPLE_SQL_QUERY).show()

=================================================

 

 

I cheched enable spark, as well when I tried to create session

However the result was failed with error message 

'No data connection named go01-dl found'

 

 

 

 

While I am trying this, I thought I have to get information of spark.

BUT I CANNOT.

 

Where Can I get the connection name of SPARK?

What should I do to do run job on Spark of CML?

Please let me know

 

 

 

There is no 'Data' tab on CML session.

There is no 'Data connectors' on Project setting and Site Administration of CML, as well.

 

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @Hae 

AFAIK, Data Connections is a Public Cloud Concept & isn't available in Private Cloud yet. In Public Cloud, [1] shows the Steps to configure Data Connections, which allows you to access the HMS of the DataLake (Unified HMS Source For The Environment). In Private Cloud, You may use the [2] to use Spark on CML. The same has Example on using Spark-On-Yarn on Base Cluster as well as Spark-On-Kubernetes on CML.

- Smarak

[1] https://docs.cloudera.com/machine-learning/cloud/mlde/topics/ml-mlde-spark-data-connection.html 

[2] https://docs.cloudera.com/machine-learning/1.5.2/spark/topics/ml-apache-spark-overview.html 

View solution in original post

4 REPLIES 4

avatar
Cloudera Employee

Hi @Hae 
Please share the CDP (Public/Private) / CML versions to understand this better. Ideally, your data connections will be chosen automatically if you have any data setup. If not try the option to create a new connection, add your datalake, and warehouse connections. Sync with workspace.

Refer to https://blog.cloudera.com/one-line-away-from-your-data/ to understand the setup.

avatar
Rising Star

Version of CM is 7.11.3 and runtime is 7.1.9-1.cdh7.1.9.p3.48381316

And version ov ECS is 1.5.2-b886-ecs-1.5.2-b886.p0.46792599

CML version is 2.0.42-b80

 

Hae_0-1710457807564.png

 

avatar
Rising Star

Version of CM is 7.11.3 and runtime is 7.1.9-1.cdh7.1.9.p3.48381316

And version ov ECS is 1.5.2-b886-ecs-1.5.2-b886.p0.46792599

CML version is 2.0.42-b80

Hae_0-1710810162520.png

 

@Surya_Sarikonda 

avatar
Super Collaborator

Hello @Hae 

AFAIK, Data Connections is a Public Cloud Concept & isn't available in Private Cloud yet. In Public Cloud, [1] shows the Steps to configure Data Connections, which allows you to access the HMS of the DataLake (Unified HMS Source For The Environment). In Private Cloud, You may use the [2] to use Spark on CML. The same has Example on using Spark-On-Yarn on Base Cluster as well as Spark-On-Kubernetes on CML.

- Smarak

[1] https://docs.cloudera.com/machine-learning/cloud/mlde/topics/ml-mlde-spark-data-connection.html 

[2] https://docs.cloudera.com/machine-learning/1.5.2/spark/topics/ml-apache-spark-overview.html