Created 10-22-2015 11:17 PM
Trying to connect spark with phoenix using JDBC. Appended location of phoenix-client.jar to the SPARK_CLASSPATH in spark_env.sh.
When I launch Spark shell, I get the following errors:
<console>:10: error: not found: value sqlContext import sqlContext.implicits._ ^ <console>:10: error: not found: value sqlContext import sqlContext.sql
Created 11-16-2015 05:23 PM
Example below works with Sandbox 2.3.2:
PS: Note I haven't changed classpath, I only used --jars option
from shell:
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar
inside spark-shell:
//option 1, read table val jdbcDF = sqlContext.read.format("jdbc").options( Map( "driver" -> "org.apache.phoenix.jdbc.PhoenixDriver", "url" -> "jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure", "dbtable" -> "TABLE1")).load() jdbcDF.show //option 2, read custom query import java.sql.{Connection, DriverManager, DatabaseMetaData, ResultSet} import org.apache.spark.rdd.JdbcRDD def getConn(driverClass: => String, connStr: => String, user: => String, pass: => String): Connection = { var conn:Connection = null try{ Class.forName(driverClass) conn = DriverManager.getConnection(connStr, user, pass) }catch{ case e: Exception => e.printStackTrace } conn } val myRDD = new JdbcRDD( sc, () => getConn("org.apache.phoenix.jdbc.PhoenixDriver", "jdbc:phoenix:localhost:2181:/hbase-unsecure", "", "") , "select sum(10) from TABLE1 where ? <= id and id <= ?", 1, 10, 2) myRDD.take(10) val myRDD = new JdbcRDD( sc, () => getConn("org.apache.phoenix.jdbc.PhoenixDriver", "jdbc:phoenix:localhost:2181:/hbase-unsecure", "", "") , "select col1 from TABLE1 where ? <= id and id <= ?", 1, 10, 2) myRDD.take(10)
Also note that Phoenix team recommends to use Phoenix Spark instead of jdbc directly: http://phoenix.apache.org/phoenix_spark.html
Here an example with PhoenixSpark package:
from shell:
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.3.2.0-2950.jar --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar"
inside spark-shell:
import org.apache.phoenix.spark._ val df = sqlContext.load( "org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181:/hbase-unsecure") ) df.show
And here a sample project that can be built and executed thru spark-submit:
https://github.com/gbraccialli/SparkUtils
git clone https://github.com/gbraccialli/SparkUtils cd SparkUtils/ mvn clean package spark-submit --class com.github.gbraccialli.spark.PhoenixSparkSample target/SparkUtils-1.0.0-SNAPSHOT.jar
Also check @Randy Gelhausen project that use Phoenix Spark to automatic load data from Hive to Phoenix:
https://github.com/randerzander/HiveToPhoenix (I copied my pom.xml from Randy's project)
Created 09-20-2016 03:05 PM
You are right, on HDP2.4 this is broken. But it still works as long as you don't go for the DF option, that one fails. It needs Phoenix 4.7.0.0 to work, that will be available in HDP2.5.
The Phoenix_Spark connector for getting Hbase data into a RDD (not DF !) works for me.
An update on @Guilherme Braccialli excellent guidance, to make the Phoenix Spark thing work, for RDD option, but then for the 2.4 Sandbox:
spark-shell --master yarn-client --jars /usr/hdp/2.4.0.0-169/phoenix/phoenix-4.4.0.2.4.0.0-169-client.jar,/usr/hdp/2.4.0.0-169/phoenix/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar --conf "spark.executor.extraClassPath=/usr/hdp/2.4.0.0-169/phoenix/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar:/usr/hdp/2.4.0.0-169/phoenix/phoenix-4.4.0.2.4.0.0-169-client.jar"
Created 03-17-2016 03:14 AM
Got this working in zeppelin. Similar answer as @Guilherme Braccialli but translated for zeppelin users
1. Run this block first
%dep z.reset z.load("/usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.2.0-2950-client.jar") z.load("/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.3.2.0-2950.jar")
2. Then in a separate snippet run this
import org.apache.phoenix.spark._ val df = sqlc.load( "org.apache.phoenix.spark", Map("table" -> "TRANSACTIONHISTORY", "zkUrl" -> "sandbox.hortonworks.com:2181:/hbase-unsecure") ) df.show
Created 05-30-2016 04:32 PM
Tried the above lines in sandbox HDP 2.4 - Zeppelin, its not working.