Support Questions

Find answers, ask questions, and share your expertise

Improving performance on spark for hive

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Gayathri Devi,

You can use spark sql to get data from Hive table and create a dataframe.

There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine

You can refer to the below links how to get data directly from HBase without using Hive table.

https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/

https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

https://github.com/hortonworks-spark/shc

View solution in original post

1 REPLY 1

avatar
Master Guru

Hi @Gayathri Devi,

You can use spark sql to get data from Hive table and create a dataframe.

There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine

You can refer to the below links how to get data directly from HBase without using Hive table.

https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/

https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

https://github.com/hortonworks-spark/shc