Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

Spark SQL comes with a nice feature called: "JDBC to other Databases", but, it practice, it's JDBC federation feature.

"It can be used to create data frames from jdbc databases using scala/python, but it also works directly with Spark SQL Thrift server and allow us to query external JDBC tables seamless like other hive/spark tables."

Example below using sandbox 2.3.2 and spark 1.5.1 TP (https://hortonworks.com/hadoop-tutorial/apache-spark-1-5-1-technical-preview-with-hdp-2-3/).

This feature works with spark-submit, spark-shell, zeppelin, spark-sql client and spark sql thift server. In this post two examples: #1 using spark-sql thrift server, #2 using spark-shell.

Example #1 using Spark SQL Thrift Server

1- Run Spark SQL Thrift Server with mysql jdbc driver:

sudo -u spark /usr/hdp/2.3.2.1-12/spark/sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10010 --jars "/usr/share/java/mysql-connector-java.jar"

2- Open beeline and connect to Spark SQL Thrift Server:

beeline -u "jdbc:hive2://localhost:10010/default" -n admin

3- Create a jdbc federated table pointing to existing mysql database, using beeline:

CREATE TABLE mysql_federated_sample
USING org.apache.spark.sql.jdbc
OPTIONS (
  driver "com.mysql.jdbc.Driver",
  url "jdbc:mysql://localhost/hive?user=hive&password=hive",
  dbtable "TBLS"
);
describe mysql_federated_sample;
select * from mysql_federated_sample;
select count(1) from mysql_federated_sample;

Example #2 using Spark shell, scala code and data frames:

1- Open spark-shell with mysql jdbc driver

spark-shell --jars "/usr/share/java/mysql-connector-java.jar"

2- Create a data frame pointing to mysql table

val jdbcDF = sqlContext.read.format("jdbc").options( 
  Map(
  "driver" -> "com.mysql.jdbc.Driver",
  "url" -> "jdbc:mysql://localhost/hive?user=hive&password=hive",
  "dbtable" -> "TBLS"
  )
).load()
  
jdbcDF.show

See other spark jdbc examples / troubleshooting here:

https://community.hortonworks.com/questions/1942/spark-to-phoenix.html

6,422 Views
Comments
New Contributor

@Guilherme Braccialli but Spark thrift server listens through port 10015, and I think if you connect through 10000, then it’s HiveServer2 thrift server. Maybe I’m missing something, it may be helpful if we can discuss this

New Contributor

but Spark thrift server listens through port 10015, and I think if you connect through 10000, then it’s HiveServer2 thrift server. Maybe I’m missing something, it may be helpful if we can discuss this

@kishore sanchina I'm this example I started Spark thrift server on port 10010 and connected using beeline to same port. You can use default port 10015 instead.

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎11-24-2015 12:49 AM
Updated by:
 
Contributors
Top Kudoed Authors