- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark to Phoenix
- Labels:
-
Apache Phoenix
-
Apache Spark
Created ‎10-22-2015 11:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Trying to connect spark with phoenix using JDBC. Appended location of phoenix-client.jar to the SPARK_CLASSPATH in spark_env.sh.
When I launch Spark shell, I get the following errors:
<console>:10: error: not found: value sqlContext import sqlContext.implicits._ ^ <console>:10: error: not found: value sqlContext import sqlContext.sql
Created ‎11-16-2015 05:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Example below works with Sandbox 2.3.2:
PS: Note I haven't changed classpath, I only used --jars option
from shell:
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar
inside spark-shell:
//option 1, read table val jdbcDF = sqlContext.read.format("jdbc").options( Map( "driver" -> "org.apache.phoenix.jdbc.PhoenixDriver", "url" -> "jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure", "dbtable" -> "TABLE1")).load() jdbcDF.show //option 2, read custom query import java.sql.{Connection, DriverManager, DatabaseMetaData, ResultSet} import org.apache.spark.rdd.JdbcRDD def getConn(driverClass: => String, connStr: => String, user: => String, pass: => String): Connection = { var conn:Connection = null try{ Class.forName(driverClass) conn = DriverManager.getConnection(connStr, user, pass) }catch{ case e: Exception => e.printStackTrace } conn } val myRDD = new JdbcRDD( sc, () => getConn("org.apache.phoenix.jdbc.PhoenixDriver", "jdbc:phoenix:localhost:2181:/hbase-unsecure", "", "") , "select sum(10) from TABLE1 where ? <= id and id <= ?", 1, 10, 2) myRDD.take(10) val myRDD = new JdbcRDD( sc, () => getConn("org.apache.phoenix.jdbc.PhoenixDriver", "jdbc:phoenix:localhost:2181:/hbase-unsecure", "", "") , "select col1 from TABLE1 where ? <= id and id <= ?", 1, 10, 2) myRDD.take(10)
Also note that Phoenix team recommends to use Phoenix Spark instead of jdbc directly: http://phoenix.apache.org/phoenix_spark.html
Here an example with PhoenixSpark package:
from shell:
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.3.2.0-2950.jar --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar"
inside spark-shell:
import org.apache.phoenix.spark._ val df = sqlContext.load( "org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181:/hbase-unsecure") ) df.show
And here a sample project that can be built and executed thru spark-submit:
https://github.com/gbraccialli/SparkUtils
git clone https://github.com/gbraccialli/SparkUtils cd SparkUtils/ mvn clean package spark-submit --class com.github.gbraccialli.spark.PhoenixSparkSample target/SparkUtils-1.0.0-SNAPSHOT.jar
Also check @Randy Gelhausen project that use Phoenix Spark to automatic load data from Hive to Phoenix:
https://github.com/randerzander/HiveToPhoenix (I copied my pom.xml from Randy's project)
Created ‎10-22-2015 11:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is this the full stack trace or there is more?
Created ‎10-22-2015 11:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please share the code and full error log?
Created ‎10-23-2015 02:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
See the stack trace attached. spark-phoenix-stack-trace.txtI ran:
spark-submit --master yarn
Created ‎10-23-2015 02:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This looks like a classpath issue. I suspect your phoenix-client.jar is packaged with classes compiled against a different jackson jar then what it finds in the classpath.
Created ‎10-23-2015 02:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It works fine ... if I don't modify the Spark classpath
Created ‎10-23-2015 03:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks like some conflict b/w Spark and Phoenix jars. No?
Googling on the data in the stack trace, it looks related to Jackson. I'm not familair with Phoenix - does it use it's own version?
Created ‎07-11-2016 09:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Vedant Jain,
How did you fix this issue. I have HDP 2.4.0 installed and i used the following command to invoke the spark-shell
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar
I didnt cahnge anything in the default configurations. just installed spark and tried and got the below eception
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row
Any thoughts would be great. Also i tried what @Guilherme Braccialli said. it didnt work too. is it a bug in HDP 2.4.0?
Created ‎11-16-2015 05:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Example below works with Sandbox 2.3.2:
PS: Note I haven't changed classpath, I only used --jars option
from shell:
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar
inside spark-shell:
//option 1, read table val jdbcDF = sqlContext.read.format("jdbc").options( Map( "driver" -> "org.apache.phoenix.jdbc.PhoenixDriver", "url" -> "jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure", "dbtable" -> "TABLE1")).load() jdbcDF.show //option 2, read custom query import java.sql.{Connection, DriverManager, DatabaseMetaData, ResultSet} import org.apache.spark.rdd.JdbcRDD def getConn(driverClass: => String, connStr: => String, user: => String, pass: => String): Connection = { var conn:Connection = null try{ Class.forName(driverClass) conn = DriverManager.getConnection(connStr, user, pass) }catch{ case e: Exception => e.printStackTrace } conn } val myRDD = new JdbcRDD( sc, () => getConn("org.apache.phoenix.jdbc.PhoenixDriver", "jdbc:phoenix:localhost:2181:/hbase-unsecure", "", "") , "select sum(10) from TABLE1 where ? <= id and id <= ?", 1, 10, 2) myRDD.take(10) val myRDD = new JdbcRDD( sc, () => getConn("org.apache.phoenix.jdbc.PhoenixDriver", "jdbc:phoenix:localhost:2181:/hbase-unsecure", "", "") , "select col1 from TABLE1 where ? <= id and id <= ?", 1, 10, 2) myRDD.take(10)
Also note that Phoenix team recommends to use Phoenix Spark instead of jdbc directly: http://phoenix.apache.org/phoenix_spark.html
Here an example with PhoenixSpark package:
from shell:
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.3.2.0-2950.jar --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar"
inside spark-shell:
import org.apache.phoenix.spark._ val df = sqlContext.load( "org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181:/hbase-unsecure") ) df.show
And here a sample project that can be built and executed thru spark-submit:
https://github.com/gbraccialli/SparkUtils
git clone https://github.com/gbraccialli/SparkUtils cd SparkUtils/ mvn clean package spark-submit --class com.github.gbraccialli.spark.PhoenixSparkSample target/SparkUtils-1.0.0-SNAPSHOT.jar
Also check @Randy Gelhausen project that use Phoenix Spark to automatic load data from Hive to Phoenix:
https://github.com/randerzander/HiveToPhoenix (I copied my pom.xml from Randy's project)
Created ‎05-12-2016 06:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The solution doesn't work for HDP 2.4.0, running this I get this error when running df.show:
16/05/12 00:53:21 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 8, usfit-hdpdev-n02.global.internal): java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row
If I use pyspark it also dies with a different stack trace.
I attempt to follow the directions at http://phoenix.apache.org/phoenix_spark.html and update the spark-config to include the spark-client.jar for both ‘spark.executor.extraClassPath’ and ‘spark.driver.extraClassPath’ spark-shell then dies with a pretty long stack trace when attempting to setup the SQLContext. Same thing for pyspark.
It appears there are some incompatibilities with the Phoenix jars and with the Spark jars as built in HDP. Does anyone have a work around?
thanks!
