Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[HDP-2.3.4]Write to Phoenix from Spark using Scala

Highlighted

[HDP-2.3.4]Write to Phoenix from Spark using Scala

Rising Star

Hi,

I used below code to insert data from Spark1.5.2 to Phoenix 4.4

df.write.format("org.apache.phoenix.spark").mode(SaveMode.Overwrite).options(collection.immutable.Map(
                "zkUrl" -> "localhost:2181/hbase-unsecure",
                "table" -> "TEST")).save(); 

Spark job run parameters

spark-shell  --properties-file  /TestDivya/Spark/Phoenix.properties --jars /usr/hdp/2.3.4.0-3485/phoenix/lib/phoenix-spark-4.4.0.2.3.4.0-3485.jar,/usr/hdp/2.3.4.0-3485/phoenix/phoenix-client.jar  --driver-class-path /usr/hdp/2.3.4.0-3485/phoenix/lib/phoenix-spark-4.4.0.2.3.4.0-3485.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/phoenix-client-4.4.0.jar --conf "spark.executor.extraClassPath=/usr/hdp/2.3.4.0-3485/phoenix/lib/phoenix-core-4.4.0.2.3.4.0-3485.jar" --conf "spark.executor.extraClassPath=/usr/hdp/2.3.4.0-3485/phoenix/lib/phoenix-core-4.4.0.2.3.4.0-3485.jar" --packages com.databricks:spark-csv_2.10:1.4.0  --master yarn-client -i /TestDivya/Spark/WriteToPheonix.scala

I am getting below error when I am running the spark job

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 411, ip-xxxxx-xx-xxx.ap-southeast-1.compute.internal): java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:localhost:2181:/hbase-unsecure;
        at org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58)
        at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsNewAPIHadoopDataset$1$anonfun$12.apply(PairRDDFunctions.scala:1030)
        at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsNewAPIHadoopDataset$1$anonfun$12.apply(PairRDDFunctions.scala:1014)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

Has any body face this issue ? Would really appreciate the help. Thanks

1 REPLY 1

Re: [HDP-2.3.4]Write to Phoenix from Spark using Scala

@Divya Gehlot - It's much easier to build a working uberjar than to fight class collisions that happen when using --jars argument for spark-submit.

See HiveToPhoenix for an example Scala Spark job with pom file for packaging into a single uber jar for spark-submit.