Support Questions

Find answers, ask questions, and share your expertise

Difference between "Hortonworks Spark-HBase Connector" and "Apache Phoenix"

Expert Contributor

I'm trying to save data, which is processed by Spark, into HBase. Actually I'm using Phoenix to read and save my DataFrames:

// Read data from HBase table, e.g. in Zeppelin Notebook
val readDF = sqlContext.read
.format("org.apache.phoenix.spark")
.option("table", targetTable)
.option("zkUrl", zkUrl)
.load() // Write DataFrame to HBase myDF.write
.format("org.apache.phoenix.spark")
.mode("overwrite")
.option("table", targetTable)
.option("zkUrl", zkUrl)
.save()

Now I saw that there's also a Hortonworks Spark-HBase Connector:

The connector seems also to use Phoenix for SQL-like data access. My question now is, what are the differences between this connector and Phoenix? Why should I use the Spark-HBase Connector if this uses also Phoenix? Thank you!

1 REPLY 1

Super Collaborator

Actually SHC doesn't use Phoenix. It works directly with HBase, but it's able to use Phoenix style encoding for storing data. It doesn't use any metadata information from Phoenix and doesn't require phoenix installed on the server. While Phoenix-Spark module is working directly with phoenix, so it's able to obtain the information about the table schema and data types automatically. So if you are mostly working with Phoenix there is no reason to use SHC.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.