Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

Improving performance on spark for hive

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Gayathri Devi,

You can use spark sql to get data from Hive table and create a dataframe.

There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine

You can refer to the below links how to get data directly from HBase without using Hive table.

https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/

https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

https://github.com/hortonworks-spark/shc

View solution in original post

1 REPLY 1

avatar
Master Guru

Hi @Gayathri Devi,

You can use spark sql to get data from Hive table and create a dataframe.

There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine

You can refer to the below links how to get data directly from HBase without using Hive table.

https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/

https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

https://github.com/hortonworks-spark/shc