- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Improving performance on spark for hive
- Labels:
-
Apache HBase
-
Apache Hive
Created 10-11-2017 10:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 10-11-2017 01:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Gayathri Devi,
You can use spark sql to get data from Hive table and create a dataframe.
There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine
You can refer to the below links how to get data directly from HBase without using Hive table.
https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/
https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/
Created 10-11-2017 01:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Gayathri Devi,
You can use spark sql to get data from Hive table and create a dataframe.
There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine
You can refer to the below links how to get data directly from HBase without using Hive table.
https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/
https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/
