Created 01-22-2017 06:38 AM
I would like to list Hbase tables using Spark SQL.
Tried below code, but its not working. Do we need to set hbase host, zookeeper quorum etc details in the Spark sql context options?
val sparkConf = new SparkConf().setAppName("test")
val sc= new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val hiveContext = new HiveContext(sqlContext)
val listOfTables = hiveContext.sql("list")
listOfTables.show
Created 01-22-2017 11:51 PM
You can't list Hbase tables using Spark SQL because Hbase tables do not have a schema. Each row can have a different number of columns and each column is stored as a byte array not a specific data types. HiveContext will only allow you to list tables in Hive not Hbase. If you have Apache Phoenix installed over the top of Hbase, it is possible to see a list of tables, but not using HiveContext.
If you are trying to see a list of Hive Tables that SparkSQL can access, then the command is "show tables" not "list". So your code should be.
val listOfTables = hiveContext.sql("show tables")
This will work assuming that you have Spark configured to point at the Hive Metastore.
Created 01-22-2017 11:51 PM
You can't list Hbase tables using Spark SQL because Hbase tables do not have a schema. Each row can have a different number of columns and each column is stored as a byte array not a specific data types. HiveContext will only allow you to list tables in Hive not Hbase. If you have Apache Phoenix installed over the top of Hbase, it is possible to see a list of tables, but not using HiveContext.
If you are trying to see a list of Hive Tables that SparkSQL can access, then the command is "show tables" not "list". So your code should be.
val listOfTables = hiveContext.sql("show tables")
This will work assuming that you have Spark configured to point at the Hive Metastore.
Created 01-23-2017 03:55 AM
Thanks for the answer, so we cannot list the Hbase tables using Spark SQL Context.
Created 01-23-2017 06:28 AM
Not unless you create a Hive table using an Hbase storage handler:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
This will impose a schema onto an Hbase table through Hive and save the schema in the metastore. Once it's in the metastore, you can access it through HiveContext.
Or if you have Phoenix installed and you create a table through Phoenix, it will create am Hbase table as well as a schema catalog table. You can do a direct JDBC connection to Phoenix just like you would connect to mysql or postgres. You just need to use the Phoenix JDBC driver. You can then use meta data getters on the JDBC connection object to get the tables in the Phoenix.
Once you know the table you want to go after
import org.apache.phoenix.spark._
val df = sqlContext.load("org.apache.phoenix.spark", Map("table"->"phoenix_table","zkUrl"->"localhost:2181:/hbase-unsecure"))
df.show
This way, Spark will load data using executors in parallel. Now just use the Data Frame with the SQL context like normal.
Created 01-23-2017 01:33 AM
Hive and HiveContext in Spark can only show the tables that are registered in the Hive Metastore and Hbase tables are usually not there because the schema of most Hbase tables are not easily defined in the metastore.
To read HBase tables from Spark using DataFrame API please consider Spark HBase Connector
Created 01-23-2017 03:53 AM
We are actually using HortonWorks Hbase connector, But i cannot use this API to list tables, this is just for one POC , which we are trying to list Hbase tables.
Created 01-23-2017 07:45 PM
SHC does not have a notion of listing tables in HBase. It works on the table catalog provided to the data source in the program. Hive will also not list HBase tables because they are not present in the metastore. There is some rudimentary way to add Hbase external tables in Hive but I dont think that really used. I could be wrong.
To list Hbase tables, currently the only reliable way would be to use HBase API's inside the spark program to list tables.