Community Articles
Find and share helpful community-sourced technical articles
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (3)
Master Collaborator



What is the difference between Pig, Hive, and HBase?




Both Pig and Hive are high-level languages that compile to MapReduce. HBase is a completely different game: it allows Hadoop to support lookups/transactions on key/value pairs. HBase allows you to (1) do quick random lookups, versus scan all of data sequentially, (2) do insert/update/delete from middle, not just add/append.

The differences between Pig and Hive are significant. Specifically:

  • Pig doesn't require underlying structure to the data, Hive does imply structure via a metastore. This has its pros and cons. It allows Pig to be more suitable for ETL tasks where the input data is still a mish-mash and you want to convert it to be structured. On the other hand, Hive's metastore provides a dictionary that lets you easily see what columns exist in which tables, which can be very handy.
  • Pig is a new language, easy to learn if you know languages similar to Perl. Hive is a subset of SQL with very simple variations to enable map-reduce-like computation. If you come from a SQL background you will find Hive QL extremely easy to pickup (many of your SQL queries will run as-is), while if you come from a procedural programming background (without SQL knowledge) then Pig will be much more suitable for you. Furthermore, Hive is a bit easier to integrate with other systems and tools since it speaks the language they already speak: SQL.


NOTE: This article was taken from our internal Knowledge Base.  To access the original article please use the following link (customer login required):


What is the Difference between Pig, Hive, and HBase?



Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎08-20-2015 01:39 PM
Updated by:
Top Kudoed Authors