Let's start with Hive and then HCatalog.
Hive
- Layer for analyzing, querying and managing large datasets that reside in Hadoop various file systems
⇢ uses HiveQL (HQL) as processing engine
⇢ uses SerDes for serialization and deserialization
⇢ works best with huge volumes of data
HCatalog
- Table and storage management layer for Hadoop
⇢ basically, the EDW system for Hadoop (it supports several file formats such as RCFile, CSV, JSON, SequenceFile, ORC)
⇢ is a sub-component of Hive, which enables ETL processes
⇢ tool for accessing metadata that reside in Hive Metastore
⇢ acts as an API to expose the metastore as REST interface to external tools such as Pig
⇢ uses WebHcat, a web server for engaging with the Hive Metastore
I think the focus has to be made on how they complement each other rather than focusing on their differences.
Documentation (3)
- This answer from @Scott Shaw is worth checking
- This slideshare presents the use cases and features of Hive and Hcatalog
- This direct graph from IBM shows how they use both layers in a batch job
I hope this helps! 🙂