Created on 04-08-202202:29 AM - edited on 04-10-202209:45 PM by subratadas
This article explains how to use the Snowflake Connector for Spark in Cloudera Machine Learning.
Save your Snowflake password for your account. Go to Account Settings > Environment Variables and create a new entry with Name as "SNOW_PASSWORD" and Value as your <password>.
Create a new session in your CML project. Use any of the editors, and the Python 3.7 kernel. You also need to enable Spark for this session and select the Spark 2.4.7 version.
Download required dependencies. To initiate the connection, you need to have a Snowflake Connector for Spark and a Snowflake JDBC Driver that's compatible with CML's Spark version. You can get these from the Maven CentralRepository: spark-connector and jdbc-driver. Place them in a ./jars folder in your CML Project.
cdsw@ysezv3fm94edq4pb:~$ ls -l ./jars/
total 28512
-rw-r--r-- 1 cdsw cdsw 28167364 Mar 17 21:03 snowflake-jdbc-3.13.16.jar
-rw-r--r-- 1 cdsw cdsw 1027154 Jan 27 01:18 spark-snowflake_2.11-2.9.3-spark_2.4.jar
Initiate the Snowflake connection. You need to set your Snowflake Account ID and your username for the connection. Your Snowflake password is retrieved from the environment variable that you configured in the Account Settings. I'm using the default Snowflake Warehouse and a sample database/schema.