Created on 03-13-2024 09:15 PM - edited 03-13-2024 09:19 PM
I have tried this code on jupyter of Cloudera Machine Learning.
============================================================
import cml.data_v1 as cmldata
from pyspark import SparkContext
#Optional Spark Configs
SparkContext.setSystemProperty('spark.executor.cores', '4')
SparkContext.setSystemProperty('spark.executor.memory', '8g')
#Boilerplate Code provided to you by CML Data Connections
CONNECTION_NAME = "go01-dl"
conn = cmldata.get_connection(CONNECTION_NAME)
spark = conn.get_spark_session()
# Sample usage to run query through spark
EXAMPLE_SQL_QUERY = "show databases"
spark.sql(EXAMPLE_SQL_QUERY).show()
=================================================
I cheched enable spark, as well when I tried to create session
However the result was failed with error message
'No data connection named go01-dl found'
While I am trying this, I thought I have to get information of spark.
BUT I CANNOT.
Where Can I get the connection name of SPARK?
What should I do to do run job on Spark of CML?
Please let me know
There is no 'Data' tab on CML session.
There is no 'Data connectors' on Project setting and Site Administration of CML, as well.
Created 04-15-2024 12:04 AM
Hello @Hae
AFAIK, Data Connections is a Public Cloud Concept & isn't available in Private Cloud yet. In Public Cloud, [1] shows the Steps to configure Data Connections, which allows you to access the HMS of the DataLake (Unified HMS Source For The Environment). In Private Cloud, You may use the [2] to use Spark on CML. The same has Example on using Spark-On-Yarn on Base Cluster as well as Spark-On-Kubernetes on CML.
- Smarak
[1] https://docs.cloudera.com/machine-learning/cloud/mlde/topics/ml-mlde-spark-data-connection.html
[2] https://docs.cloudera.com/machine-learning/1.5.2/spark/topics/ml-apache-spark-overview.html
Created 03-14-2024 02:07 PM
Hi @Hae
Please share the CDP (Public/Private) / CML versions to understand this better. Ideally, your data connections will be chosen automatically if you have any data setup. If not try the option to create a new connection, add your datalake, and warehouse connections. Sync with workspace.
Refer to https://blog.cloudera.com/one-line-away-from-your-data/ to understand the setup.
Created on 03-14-2024 04:10 PM - edited 03-14-2024 04:11 PM
Version of CM is 7.11.3 and runtime is 7.1.9-1.cdh7.1.9.p3.48381316
And version ov ECS is 1.5.2-b886-ecs-1.5.2-b886.p0.46792599
CML version is 2.0.42-b80
Created 03-18-2024 06:02 PM
Version of CM is 7.11.3 and runtime is 7.1.9-1.cdh7.1.9.p3.48381316
And version ov ECS is 1.5.2-b886-ecs-1.5.2-b886.p0.46792599
CML version is 2.0.42-b80
Created 04-15-2024 12:04 AM
Hello @Hae
AFAIK, Data Connections is a Public Cloud Concept & isn't available in Private Cloud yet. In Public Cloud, [1] shows the Steps to configure Data Connections, which allows you to access the HMS of the DataLake (Unified HMS Source For The Environment). In Private Cloud, You may use the [2] to use Spark on CML. The same has Example on using Spark-On-Yarn on Base Cluster as well as Spark-On-Kubernetes on CML.
- Smarak
[1] https://docs.cloudera.com/machine-learning/cloud/mlde/topics/ml-mlde-spark-data-connection.html
[2] https://docs.cloudera.com/machine-learning/1.5.2/spark/topics/ml-apache-spark-overview.html