Created 05-27-2016 12:37 PM
If we have to compare SQL tools on HDFS : HAWQ vs HIVE, for a Data-Lake of size 400+TB(Without replication) semi structured machine data. Which is preferable ? How will HIVE perform ? Pros and Cons of using HIVE compared to HAWQ.
Note : Right now we are on older version of HAWQ .This is very unstable and are evaluating options.
Created 05-27-2016 12:42 PM
This is a nice post comparing hive on mr,tez vs HAWQ
https://www.pivotalguru.com/?p=956
hope it will help you.
Created 05-27-2016 12:42 PM
This is a nice post comparing hive on mr,tez vs HAWQ
https://www.pivotalguru.com/?p=956
hope it will help you.
Created 05-30-2016 02:15 PM
Thank you Raj.. How had been your experience with HAWQ ? Do you recommend HAWQ/Hive ? Are you using latest version of HAWQ and how is it working ?
Created 05-27-2016 02:04 PM
Below link might help you more
Created 05-30-2016 02:19 PM
Thank you Sri.. How had been your experience with HAWQ ? Do you recommend HAWQ/Hive ? Are you using latest version of HAWQ and how is it working ? what do you recommend ?
Created 05-27-2016 04:20 PM
Whenever you are trying to evaluate Hive on Tez vs any other tool and this is for data analytics (with row level update/access patterns), my suggestion is to start with hive, use all the right tunings at OS, Cluster and Hive level, use ORC, bloom filters, organize your data and see if query times hit your SLA.
We have a seen at a lot of places that once they tune hive correctly and move away from text files they will hit SLAs. You can then look at other tools if and when your SLAs are not met.
Created 05-30-2016 02:12 PM
Thank you Ravi. Just checking ,
Have you ever seen HIVE and HAWQ in single environment ?
Did any of the customers you worked , moved away from HIVE to HAWQ and what where there reasons to do?
Created 07-11-2016 05:26 PM
For folks interested in data science, HAWQ supports Apache MADlib (incubating) http://madlib.incubator.apache.org/ which is a mature, distributed, in-database machine learning library. Hive does not have an integrated machine learning capability at the current time.