If we have to compare SQL tools on HDFS : HAWQ vs HIVE, for a Data-Lake of size 400+TB(Without replication) semi structured machine data. Which is preferable ? How will HIVE perform ? Pros and Cons of using HIVE compared to HAWQ.
Note : Right now we are on older version of HAWQ .This is very unstable and are evaluating options.
Thank you Raj.. How had been your experience with HAWQ ? Do you recommend HAWQ/Hive ? Are you using latest version of HAWQ and how is it working ?
Thank you Sri.. How had been your experience with HAWQ ? Do you recommend HAWQ/Hive ? Are you using latest version of HAWQ and how is it working ? what do you recommend ?
Whenever you are trying to evaluate Hive on Tez vs any other tool and this is for data analytics (with row level update/access patterns), my suggestion is to start with hive, use all the right tunings at OS, Cluster and Hive level, use ORC, bloom filters, organize your data and see if query times hit your SLA.
We have a seen at a lot of places that once they tune hive correctly and move away from text files they will hit SLAs. You can then look at other tools if and when your SLAs are not met.
Thank you Ravi. Just checking ,
Have you ever seen HIVE and HAWQ in single environment ?
Did any of the customers you worked , moved away from HIVE to HAWQ and what where there reasons to do?
For folks interested in data science, HAWQ supports Apache MADlib (incubating) http://madlib.incubator.apache.org/ which is a mature, distributed, in-database machine learning library. Hive does not have an integrated machine learning capability at the current time.