Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SQL, JDBC, ODBC - Access on HBase - Best Practices, Ideas

Highlighted

SQL, JDBC, ODBC - Access on HBase - Best Practices, Ideas

New Contributor

Dear All,

The following scenario.

We have our data sitting in the NoSql DB HBase (on HDP - currently 2.3.x - potentially soon 2.4). The requirement exists to access this data with SQL-type queries via JDBC, ODBC.

Now my question: Are there best practices here how todo this ? What worked well, what did not ? We have tried out Apache Phoenix but it is not so ideal so far (e.g. has no ODBC driver – only available via 3rd party is our understanding).

Any experiences with Apache Drill ? Is this more suitable ? https://drill.apache.org/team/

From the description it sounds promising.

Another alternative now especially as of HDP 2.4 seems to be Apache HAWQ (MPP).

http://de.hortonworks.com/press-releases/hortonworks-pivotal-expand-relationship-deliver-enterprise-...

Unsure how it works in practice in combination with HBase. For sure we don’t want to store data twice (might not be required when using PXF - http://hawq.docs.pivotal.io/docs-hawq/topics/PXFInstallationandAdministration.html#accessinghbasedat.... Unsure though if PXF on HBase files then is still fast or really you need the data then in HAWQ directly which we dont want as the store is HBase.

And finally - http://kylin.apache.org/.

That is the OLAP Cube on Hadoop Approach which has metadata in Hive and actual data stored in HBase and as per description allows sub-second sql results via JDBC, ODBC etc.

If there are best practices, lessons learned, experiences with one of those in combination with HBase, I would be interested in the details.

5 REPLIES 5

Re: SQL, JDBC, ODBC - Access on HBase - Best Practices, Ideas

Super Collaborator

There is customer who deploys Kylin along side hbase in production.

Note you need to checkout the following branch of Kylin to work with hbase 1.x releases:

1.5.x-HBase1.x

Re: SQL, JDBC, ODBC - Access on HBase - Best Practices, Ideas

Super Collaborator

Re: SQL, JDBC, ODBC - Access on HBase - Best Practices, Ideas

New Contributor

Apache phoenix can be accessed through a jdbc connection (https://phoenix.apache.org/) or by establishing a queryServer connection (https://phoenix.apache.org/server.html). If you want to test the jdbc connection, feel free to try sqlline.py shipped script. For the queryServer sample, please use sqlline-thin.py.

Re: SQL, JDBC, ODBC - Access on HBase - Best Practices, Ideas

Super Collaborator

Gerald mentioned ODBC support which is currently lacking in Phoenix.

FYI

Re: SQL, JDBC, ODBC - Access on HBase - Best Practices, Ideas

New Contributor

thank you all for the answers so far. it seems there are quite some projects in this area here. Somehow the best practice etc, benchmarks will come-out over time I assume. Another option is probably also soon out there. http://de.hortonworks.com/blog/future-apache-hadoop/ Hive 2.0 - "The Hive community is working towards a 2.0 release of Hive that includes significant new features and performance improvements. These include: * Adding LLAP, a daemon layer that enables sub-second response time."

Don't have an account?
Coming from Hortonworks? Activate your account here