Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Difference between "Hortonworks Spark-HBase Connector" and "Apache Phoenix"

Highlighted

Difference between "Hortonworks Spark-HBase Connector" and "Apache Phoenix"

Expert Contributor

I'm trying to save data, which is processed by Spark, into HBase. Actually I'm using Phoenix to read and save my DataFrames:

// Read data from HBase table, e.g. in Zeppelin Notebook
val readDF = sqlContext.read
.format("org.apache.phoenix.spark")
.option("table", targetTable)
.option("zkUrl", zkUrl)
.load() // Write DataFrame to HBase myDF.write
.format("org.apache.phoenix.spark")
.mode("overwrite")
.option("table", targetTable)
.option("zkUrl", zkUrl)
.save()

Now I saw that there's also a Hortonworks Spark-HBase Connector:

The connector seems also to use Phoenix for SQL-like data access. My question now is, what are the differences between this connector and Phoenix? Why should I use the Spark-HBase Connector if this uses also Phoenix? Thank you!

1 REPLY 1

Re: Difference between "Hortonworks Spark-HBase Connector" and "Apache Phoenix"

Super Collaborator

Actually SHC doesn't use Phoenix. It works directly with HBase, but it's able to use Phoenix style encoding for storing data. It doesn't use any metadata information from Phoenix and doesn't require phoenix installed on the server. While Phoenix-Spark module is working directly with phoenix, so it's able to obtain the information about the table schema and data types automatically. So if you are mostly working with Phoenix there is no reason to use SHC.

Don't have an account?
Coming from Hortonworks? Activate your account here