Created 05-24-2016 04:11 PM
My hive queries are taking long time to query the data. We have around 2 million records to which I use to supply the query and get the result after a long wait time. I was looking for an alternative and Spark came to mind at first. I was going through some hortonworks links that has illustrated to query the hive table using SparkSQL (SSQL) but that was quite generic. Here is my requirement.
I have hive tables already created and I need to query them using SSQL. How best can I do that?
I also like to create new hive tables using SSQL. Would the table be the same as hive table or different? If yes, in what ways are they gonna be different? Would I still be able to query the tables created by SSQL using Hive or Beeline?
Created 05-25-2016 05:29 AM
These are some decks comparing spark-sql,hive on tez and hive on spark.
http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final
hive on spark (HIVE-7292) is still in beta phase, with earliar version of spark we have hiveContext object to query hive tables but starting with spark-1.4 you can query hive tables using sqlContext object.
Created 05-24-2016 04:13 PM
Created 05-25-2016 05:29 AM
These are some decks comparing spark-sql,hive on tez and hive on spark.
http://www.slideshare.net/hortonworks/hive-on-spark-is-blazing-fast-or-is-it-final
hive on spark (HIVE-7292) is still in beta phase, with earliar version of spark we have hiveContext object to query hive tables but starting with spark-1.4 you can query hive tables using sqlContext object.