Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.
Labels (1)

Spark SQL comes with a nice feature called: "JDBC to other Databases", but, it practice, it's JDBC federation feature.

"It can be used to create data frames from jdbc databases using scala/python, but it also works directly with Spark SQL Thrift server and allow us to query external JDBC tables seamless like other hive/spark tables."

Example below using sandbox 2.3.2 and spark 1.5.1 TP (https://hortonworks.com/hadoop-tutorial/apache-spark-1-5-1-technical-preview-with-hdp-2-3/).

This feature works with spark-submit, spark-shell, zeppelin, spark-sql client and spark sql thift server. In this post two examples: #1 using spark-sql thrift server, #2 using spark-shell.

Example #1 using Spark SQL Thrift Server

1- Run Spark SQL Thrift Server with mysql jdbc driver:

sudo -u spark /usr/hdp/2.3.2.1-12/spark/sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10010 --jars "/usr/share/java/mysql-connector-java.jar"

2- Open beeline and connect to Spark SQL Thrift Server:

beeline -u "jdbc:hive2://localhost:10010/default" -n admin

3- Create a jdbc federated table pointing to existing mysql database, using beeline:

CREATE TABLE mysql_federated_sample
USING org.apache.spark.sql.jdbc
OPTIONS (
  driver "com.mysql.jdbc.Driver",
  url "jdbc:mysql://localhost/hive?user=hive&password=hive",
  dbtable "TBLS"
);
describe mysql_federated_sample;
select * from mysql_federated_sample;
select count(1) from mysql_federated_sample;

Example #2 using Spark shell, scala code and data frames:

1- Open spark-shell with mysql jdbc driver

spark-shell --jars "/usr/share/java/mysql-connector-java.jar"

2- Create a data frame pointing to mysql table

val jdbcDF = sqlContext.read.format("jdbc").options( 
  Map(
  "driver" -> "com.mysql.jdbc.Driver",
  "url" -> "jdbc:mysql://localhost/hive?user=hive&password=hive",
  "dbtable" -> "TBLS"
  )
).load()
  
jdbcDF.show

See other spark jdbc examples / troubleshooting here:

https://community.hortonworks.com/questions/1942/spark-to-phoenix.html

8,416 Views
Comments

@Guilherme Braccialli but Spark thrift server listens through port 10015, and I think if you connect through 10000, then it’s HiveServer2 thrift server. Maybe I’m missing something, it may be helpful if we can discuss this

but Spark thrift server listens through port 10015, and I think if you connect through 10000, then it’s HiveServer2 thrift server. Maybe I’m missing something, it may be helpful if we can discuss this

@kishore sanchina I'm this example I started Spark thrift server on port 10010 and connected using beeline to same port. You can use default port 10015 instead.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎11-24-2015 12:49 AM
Updated by:
Contributors
Top Kudoed Authors