Support Questions

Find answers, ask questions, and share your expertise

HAWQ VS HIVE

avatar

If we have to compare SQL tools on HDFS : HAWQ vs HIVE, for a Data-Lake of size 400+TB(Without replication) semi structured machine data. Which is preferable ? How will HIVE perform ? Pros and Cons of using HIVE compared to HAWQ.

Note : Right now we are on older version of HAWQ .This is very unstable and are evaluating options.

1 ACCEPTED SOLUTION

avatar
Super Guru

This is a nice post comparing hive on mr,tez vs HAWQ

https://www.pivotalguru.com/?p=956

hope it will help you.

View solution in original post

7 REPLIES 7

avatar
Super Guru

This is a nice post comparing hive on mr,tez vs HAWQ

https://www.pivotalguru.com/?p=956

hope it will help you.

avatar

Thank you Raj.. How had been your experience with HAWQ ? Do you recommend HAWQ/Hive ? Are you using latest version of HAWQ and how is it working ?

avatar

avatar

Thank you Sri.. How had been your experience with HAWQ ? Do you recommend HAWQ/Hive ? Are you using latest version of HAWQ and how is it working ? what do you recommend ?

avatar
Guru

Whenever you are trying to evaluate Hive on Tez vs any other tool and this is for data analytics (with row level update/access patterns), my suggestion is to start with hive, use all the right tunings at OS, Cluster and Hive level, use ORC, bloom filters, organize your data and see if query times hit your SLA.

We have a seen at a lot of places that once they tune hive correctly and move away from text files they will hit SLAs. You can then look at other tools if and when your SLAs are not met.

avatar

Thank you Ravi. Just checking ,

Have you ever seen HIVE and HAWQ in single environment ?

Did any of the customers you worked , moved away from HIVE to HAWQ and what where there reasons to do?

avatar
New Contributor

For folks interested in data science, HAWQ supports Apache MADlib (incubating) http://madlib.incubator.apache.org/ which is a mature, distributed, in-database machine learning library. Hive does not have an integrated machine learning capability at the current time.