Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Improving performance on spark for hive

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Gayathri Devi,

You can use spark sql to get data from Hive table and create a dataframe.

There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine

You can refer to the below links how to get data directly from HBase without using Hive table.

https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/

https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

https://github.com/hortonworks-spark/shc

View solution in original post

1 REPLY 1

avatar
Master Guru

Hi @Gayathri Devi,

You can use spark sql to get data from Hive table and create a dataframe.

There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine

You can refer to the below links how to get data directly from HBase without using Hive table.

https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/

https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

https://github.com/hortonworks-spark/shc