Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How do I access MySQL/ Postgres (external) databases and tables directly using Apache Spark?

How do I access MySQL/ Postgres (external) databases and tables directly using Apache Spark?

Explorer

I have installed Cloudera 5.16.2 on Ubuntu 16.04. 

 

Most of the data I need to work on are in MySQL or Postgresql databases and I want to use Apache Spark to work on this data directly. I prefer working with PySpark and have installed and configured Anaconda through Cloudera Manager. 

 

How would I connect to MySQL / Postgres using Apache Spark on a Jupyter Notebook? Could you give me a step by step guide to achieve this?

 

Thank you 

4 REPLIES 4
Highlighted

Re: How do I access MySQL/ Postgres (external) databases and tables directly using Apache Spark?

Super Mentor

Re: How do I access MySQL/ Postgres (external) databases and tables directly using Apache Spark?

Explorer

Hi @jsensharma. Thanks for the reply. I realised that I had not worded my query properly and am actually looking to be able to connect to MySQL / Postgres through Apache spark using a Jupyter Workbook. I am more comfortable working with Python and am not at all familiar with Scala /Java (can't stand them either). Apologies in advance for the vague request earlier

Re: How do I access MySQL/ Postgres (external) databases and tables directly using Apache Spark?

Mentor

@Tipo33 

 

I think the below link should help you achieve that, usually to be able to connect to a remote database the DNS should be able to resolve the  FQDN/IP  in the example in this link it could be like this?

 

https://pjryan126.github.io/jupyter-db-connections/

 

import urllib

params = 'DRIVER={ODBC Driver 13 for SQL Server};' \
         'SERVER=192.168.1.200;' \ or 'SERVER=zephyr.tesco.uk;'
         'PORT=1433;' \
         'DATABASE=nb-database;' \
         'UID=nb-user;' \
         'PWD=nb-password;'
            
params = urllib.parse.quote_plus(params)

Here is a snippet for connecting to Mysql

import MySQLdb

Next, enter your database credentials.

#Enter the values for you database connection
dsn_database = "Enter Database name"  # e.g. "MySQLdbtest"
dsn_hostname = "<Enter Hostname" # e.g.: "mydbinstance.xyz.us-east-1.rds.amazonaws.com"
dsn_port = 3306               # e.g. 3306
dsn_uid = "<Enter UserID>"       # e.g. "user1"
dsn_pwd = "<Enter Password"     # e.g. "Password123"

Now to establish the database connection.

conn = MySQLdb.connect(host=dsn_hostname, port=dsn_port, user=dsn_uid, passwd=dsn_pwd, db=dsn_database)

HTH

 

 

 

 

 

 

 

 

 

 

 

Re: How do I access MySQL/ Postgres (external) databases and tables directly using Apache Spark?

Explorer

Thanks @Shelton. Just to check: When Python directly connects with the MySQL database, wouldn't that infer that only Python is working on the tables, and not Spark? This is more so since an ODBC is used and not a JDBC.

 

Is there a way for Spark to directly access the MySQL database similar to what @jsensharma pointed out above? 

Don't have an account?
Coming from Hortonworks? Activate your account here