Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

List hbase tables Spark sql

avatar
Contributor

I would like to list Hbase tables using Spark SQL.

Tried below code, but its not working. Do we need to set hbase host, zookeeper quorum etc details in the Spark sql context options?

val sparkConf = new SparkConf().setAppName("test")

val sc= new SparkContext(sparkConf)

val sqlContext = new SQLContext(sc)

val hiveContext = new HiveContext(sqlContext)

val listOfTables = hiveContext.sql("list")

listOfTables.show

1 ACCEPTED SOLUTION

avatar
Guru

@Sankaraiah Narayanasamy

You can't list Hbase tables using Spark SQL because Hbase tables do not have a schema. Each row can have a different number of columns and each column is stored as a byte array not a specific data types. HiveContext will only allow you to list tables in Hive not Hbase. If you have Apache Phoenix installed over the top of Hbase, it is possible to see a list of tables, but not using HiveContext.

If you are trying to see a list of Hive Tables that SparkSQL can access, then the command is "show tables" not "list". So your code should be.

val listOfTables = hiveContext.sql("show tables")

This will work assuming that you have Spark configured to point at the Hive Metastore.

View solution in original post

6 REPLIES 6

avatar
Guru

@Sankaraiah Narayanasamy

You can't list Hbase tables using Spark SQL because Hbase tables do not have a schema. Each row can have a different number of columns and each column is stored as a byte array not a specific data types. HiveContext will only allow you to list tables in Hive not Hbase. If you have Apache Phoenix installed over the top of Hbase, it is possible to see a list of tables, but not using HiveContext.

If you are trying to see a list of Hive Tables that SparkSQL can access, then the command is "show tables" not "list". So your code should be.

val listOfTables = hiveContext.sql("show tables")

This will work assuming that you have Spark configured to point at the Hive Metastore.

avatar
Contributor

@Vadim Vaks:

Thanks for the answer, so we cannot list the Hbase tables using Spark SQL Context.

avatar
Guru

@Sankaraiah Narayanasamy

Not unless you create a Hive table using an Hbase storage handler:

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

This will impose a schema onto an Hbase table through Hive and save the schema in the metastore. Once it's in the metastore, you can access it through HiveContext.

Or if you have Phoenix installed and you create a table through Phoenix, it will create am Hbase table as well as a schema catalog table. You can do a direct JDBC connection to Phoenix just like you would connect to mysql or postgres. You just need to use the Phoenix JDBC driver. You can then use meta data getters on the JDBC connection object to get the tables in the Phoenix.

Once you know the table you want to go after

import org.apache.phoenix.spark._

val df = sqlContext.load("org.apache.phoenix.spark", Map("table"->"phoenix_table","zkUrl"->"localhost:2181:/hbase-unsecure"))

df.show

This way, Spark will load data using executors in parallel. Now just use the Data Frame with the SQL context like normal.

avatar
Super Collaborator

Hive and HiveContext in Spark can only show the tables that are registered in the Hive Metastore and Hbase tables are usually not there because the schema of most Hbase tables are not easily defined in the metastore.

To read HBase tables from Spark using DataFrame API please consider Spark HBase Connector

avatar
Contributor

@Bikas

We are actually using HortonWorks Hbase connector, But i cannot use this API to list tables, this is just for one POC , which we are trying to list Hbase tables.

avatar
Super Collaborator

SHC does not have a notion of listing tables in HBase. It works on the table catalog provided to the data source in the program. Hive will also not list HBase tables because they are not present in the metastore. There is some rudimentary way to add Hbase external tables in Hive but I dont think that really used. I could be wrong.

To list Hbase tables, currently the only reliable way would be to use HBase API's inside the spark program to list tables.