What is the difference between Pig, Hive, and HBase?
Both Pig and Hive are high-level languages that compile to MapReduce. HBase is a completely different game: it allows Hadoop to support lookups/transactions on key/value pairs. HBase allows you to (1) do quick random lookups, versus scan all of data sequentially, (2) do insert/update/delete from middle, not just add/append.
The differences between Pig and Hive are significant. Specifically:
Pig doesn't require underlying structure to the data, Hive does imply structure via a metastore. This has its pros and cons. It allows Pig to be more suitable for ETL tasks where the input data is still a mish-mash and you want to convert it to be structured. On the other hand, Hive's metastore provides a dictionary that lets you easily see what columns exist in which tables, which can be very handy.
Pig is a new language, easy to learn if you know languages similar to Perl. Hive is a subset of SQL with very simple variations to enable map-reduce-like computation. If you come from a SQL background you will find Hive QL extremely easy to pickup (many of your SQL queries will run as-is), while if you come from a procedural programming background (without SQL knowledge) then Pig will be much more suitable for you. Furthermore, Hive is a bit easier to integrate with other systems and tools since it speaks the language they already speak: SQL.
NOTE: This article was taken from our internal Knowledge Base. To access the original article please use the following link (customer login required):