Improving performance on spark for hive

Gayathridevi — Wed, 11 Oct 2017 17:36:11 GMT

Re: Improving performance on spark for hive

Shu_ashu — Wed, 11 Oct 2017 20:08:45 GMT

You can use spark sql to get data from Hive table and create a dataframe.

There is another best way to get data from HBase table, in this method we are going to construct HbaseRDD from scratch and this is more scalable,better fit for spark catalyst engine

You can refer to the below links how to get data directly from HBase without using Hive table.

https://hortonworks.com/blog/spark-hbase-connector-a-year-in-review/

https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

https://github.com/hortonworks-spark/shc

question Re: Improving performance on spark for hive in Archives of Support Questions (Read Only)

Improving performance on spark for hive

Re: Improving performance on spark for hive