Created on 11-24-2015 12:49 AM
Spark SQL comes with a nice feature called: "JDBC to other Databases", but, it practice, it's JDBC federation feature.
"It can be used to create data frames from jdbc databases using scala/python, but it also works directly with Spark SQL Thrift server and allow us to query external JDBC tables seamless like other hive/spark tables."
Example below using sandbox 2.3.2 and spark 1.5.1 TP (https://hortonworks.com/hadoop-tutorial/apache-spark-1-5-1-technical-preview-with-hdp-2-3/).
This feature works with spark-submit, spark-shell, zeppelin, spark-sql client and spark sql thift server. In this post two examples: #1 using spark-sql thrift server, #2 using spark-shell.
Example #1 using Spark SQL Thrift Server
1- Run Spark SQL Thrift Server with mysql jdbc driver:
sudo -u spark /usr/hdp/2.3.2.1-12/spark/sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10010 --jars "/usr/share/java/mysql-connector-java.jar"
2- Open beeline and connect to Spark SQL Thrift Server:
beeline -u "jdbc:hive2://localhost:10010/default" -n admin
3- Create a jdbc federated table pointing to existing mysql database, using beeline:
CREATE TABLE mysql_federated_sample USING org.apache.spark.sql.jdbc OPTIONS ( driver "com.mysql.jdbc.Driver", url "jdbc:mysql://localhost/hive?user=hive&password=hive", dbtable "TBLS" ); describe mysql_federated_sample; select * from mysql_federated_sample; select count(1) from mysql_federated_sample;
Example #2 using Spark shell, scala code and data frames:
1- Open spark-shell with mysql jdbc driver
spark-shell --jars "/usr/share/java/mysql-connector-java.jar"
2- Create a data frame pointing to mysql table
val jdbcDF = sqlContext.read.format("jdbc").options( Map( "driver" -> "com.mysql.jdbc.Driver", "url" -> "jdbc:mysql://localhost/hive?user=hive&password=hive", "dbtable" -> "TBLS" ) ).load() jdbcDF.show
See other spark jdbc examples / troubleshooting here:
https://community.hortonworks.com/questions/1942/spark-to-phoenix.html
Created on 08-03-2016 07:20 PM
@Guilherme Braccialli but Spark thrift server listens through port 10015, and I think if you connect through 10000, then it’s HiveServer2 thrift server. Maybe I’m missing something, it may be helpful if we can discuss this
Created on 08-03-2016 07:20 PM
but Spark thrift server listens through port 10015, and I think if you connect through 10000, then it’s HiveServer2 thrift server. Maybe I’m missing something, it may be helpful if we can discuss this
Created on 08-23-2016 07:35 AM
@kishore sanchina I'm this example I started Spark thrift server on port 10010 and connected using beeline to same port. You can use default port 10015 instead.