Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

Spark SQL comes with a nice feature called: "JDBC to other Databases", but, it practice, it's JDBC federation feature.

"It can be used to create data frames from jdbc databases using scala/python, but it also works directly with Spark SQL Thrift server and allow us to query external JDBC tables seamless like other hive/spark tables."

Example below using sandbox 2.3.2 and spark 1.5.1 TP (https://hortonworks.com/hadoop-tutorial/apache-spark-1-5-1-technical-preview-with-hdp-2-3/).

This feature works with spark-submit, spark-shell, zeppelin, spark-sql client and spark sql thift server. In this post two examples: #1 using spark-sql thrift server, #2 using spark-shell.

Example #1 using Spark SQL Thrift Server

1- Run Spark SQL Thrift Server with mysql jdbc driver:

sudo -u spark /usr/hdp/2.3.2.1-12/spark/sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10010 --jars "/usr/share/java/mysql-connector-java.jar"

2- Open beeline and connect to Spark SQL Thrift Server:

beeline -u "jdbc:hive2://localhost:10010/default" -n admin

3- Create a jdbc federated table pointing to existing mysql database, using beeline:

CREATE TABLE mysql_federated_sample
USING org.apache.spark.sql.jdbc
OPTIONS (
  driver "com.mysql.jdbc.Driver",
  url "jdbc:mysql://localhost/hive?user=hive&password=hive",
  dbtable "TBLS"
);
describe mysql_federated_sample;
select * from mysql_federated_sample;
select count(1) from mysql_federated_sample;

Example #2 using Spark shell, scala code and data frames:

1- Open spark-shell with mysql jdbc driver

spark-shell --jars "/usr/share/java/mysql-connector-java.jar"

2- Create a data frame pointing to mysql table

val jdbcDF = sqlContext.read.format("jdbc").options( 
  Map(
  "driver" -> "com.mysql.jdbc.Driver",
  "url" -> "jdbc:mysql://localhost/hive?user=hive&password=hive",
  "dbtable" -> "TBLS"
  )
).load()
  
jdbcDF.show

See other spark jdbc examples / troubleshooting here:

https://community.hortonworks.com/questions/1942/spark-to-phoenix.html

8,911 Views
Comments
avatar

@Guilherme Braccialli but Spark thrift server listens through port 10015, and I think if you connect through 10000, then it’s HiveServer2 thrift server. Maybe I’m missing something, it may be helpful if we can discuss this

avatar

but Spark thrift server listens through port 10015, and I think if you connect through 10000, then it’s HiveServer2 thrift server. Maybe I’m missing something, it may be helpful if we can discuss this

avatar

@kishore sanchina I'm this example I started Spark thrift server on port 10010 and connected using beeline to same port. You can use default port 10015 instead.