Reply
New Contributor
Posts: 1
Registered: ‎09-28-2015

Save RDD data into Phoenix tables using Spark

I have AWS cluster configured with CDH 5.4.3 with 1 namenode and 2 datanodes

Spark version - 1.3.0
HBase version - 1.0.0
Phoenix version - 4.3.0-1.clabs_phoenix1.0.0.p0.78
Java version - jdk1.7.0_67-cloudera

I have configured Apache Phoenix by using the parcels by following this documentation http://www.cloudera.com/content/cloudera/en/developers/home/cloudera-labs/apache-phoenix/install-apa...

I'm able to write a standalone Java-JDBC application to create table, insert records into the table and query the records from the table by adding the following dependency

 

<dependency>
    <groupId>org.apache.phoenix</groupId>
    <artifactId>phoenix-core</artifactId>
    <version>4.3.0-clabs-phoenix-1.0.0</version>
</dependency>


Now I want to perform the same operation using Spark. I want to save the data that my Spark RDD holds into the phoenix tables.

This is the cloudera repository that I have added in my pom.xml to get the phoenix jars https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/phoenix/

with in this repo, I dont see phoenix-spark of version 4.3.0-1.clabs_phoenix1.0.0, so can anyone guide me which dependency I need to add to do the same operation using Spark instead of standalone jdbc application.

I am following this link https://phoenix.apache.org/phoenix_spark.html to perform the same but of no use. I'm a Java developer and completly newbie in Scala, if some one can provide sample code to do the same using Java 7 will be of great helpful.

Highlighted
Contributor
Posts: 46
Registered: ‎11-03-2014

Re: Save RDD data into Phoenix tables using Spark

As specified in https://phoenix.apache.org/phoenix_spark.html:

 

Prerequisites

  • Phoenix 4.4.0+
  • Spark 1.3.0+

 

You should probably not use 4.3.0-* POM if you want the Phoenix-Spark integration function.

 

The most matched available POM seems to be 4.5.2-cdh5.4.5. You may try to modify phoenix-spark-4.5.2-cdh5.4.5.pom to fit your CDH version, or just use it and see whether anything breaks :P.

 

Another options is to use plain Phoenix JDBC, then 4.3.0-clabs-phoenix-1.0.0 should be OK.

 

A final option: directly use whatever jar installed by Cloudera, as if they are local jar. Then you need to upgrade manually when you upgrade CDH. But since the jars were installed already, you don't need Maven to package / distrubute them for you.

Announcements