- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark job reeturns empty rows from HBase
- Labels:
-
Apache HBase
-
Apache Spark
Created ‎07-03-2018 03:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Community,
I'm running a basic spark job which reads from an HBase table.
I can see the job is getting complete without any error, but in output I get the empty rows.
Will appreciate any help.
Below is my code
object objectName { def catalog = s"""{ |"table":{"namespace":"namespaceName", "name":"tableName"}, |"rowkey":"rowKeyAttribute", |"columns":{ |"Key":{"cf":"rowkey", "col":"rowKeyAttribute", "type":"string"}, |"col1":{"cf":"cfName", "col":"col1", "type":"bigint"}, |"col2":{"cf":"cfName", "col":"col2", "type":"string"} |} |}""".stripMargin def main(args: Array[String]) { val spark = SparkSession.builder() .appName("dummyApplication") .getOrCreate() val sc = spark.sparkContext val sqlContext = spark.sqlContext import sqlContext.implicits._ def withCatalog(cat: String): DataFrame = { sqlContext .read .options(Map(HBaseTableCatalog.tableCatalog -> cat)) .format("org.apache.spark.sql.execution.datasources.hbase") .load() } }
Created ‎07-03-2018 05:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you check out the docs?
Did you look at this other HCC post on a similar topic:
https://community.hortonworks.com/questions/49743/read-hbase-table-by-using-sparkscala.html
Created ‎07-03-2018 04:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I dont see any code making use of withCatalog function. If this function is not beeing used what is the expected output?
As an example perhaps you could try adding something like this to show some of the content of the hbase table:
val df = withCatalog(catalog) df.show()
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created ‎07-03-2018 07:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Felix Albani thanks for your response. Please accept my sincere apologies I somehow missed to include that part of the code. I have updated now.
This is the output I see(Please note that I have changed the number of columns in above code, hence the difference).
+----+----+----+----+----+----+----+----+----+ |col4|col7|col1|col3|col6|col0|col8|col2|col5| +----+----+----+----+----+----+----+----+----+ +----+----+----+----+----+----+----+----+----+ 18/07/03 16:16:27 INFO CodeGenerator: Code generated in 10.60842 ms 18/07/03 16:16:27 INFO CodeGenerator: Code generated in 8.990531 ms +----+ |col0| +----+ +----+
Created ‎07-03-2018 07:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please run the following from HBase shell:
hbase> scan 'tableName', {'LIMIT' => 5}
Also check what the describe table prints:
bhase> describe ‘tableName’
Make sure you are using case-sensitive name when referencing table from spark code.
HTH
Created ‎07-03-2018 09:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Felix Albani I have checked these things already.
Created ‎07-03-2018 09:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@vivek jain Could you try running the following steps and see if that works:
including table creation?
Created ‎07-03-2018 09:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Felix Albani I too really wanted to try this but these libraries are not deployed in cluster instead I create a dependencies jar and then I use it spark-submit.
Created ‎07-03-2018 09:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Felix Albani know what, I tried for a table with default namespace. I'm able to view data. Seems its working for tables without namespace.
Created ‎07-03-2018 10:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
just found that if I mention table as "table":{"name":"namespace:tablename"} in catalog then it works. Thanks for your time.
Created ‎07-03-2018 10:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@vivek jain Good to hear that. If you think the answer and followups have helped please take a moment to login and mark as "Accepted"
