- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How HCatalog is different from Hive?
Created on 10-24-2018 11:47 AM - edited 09-16-2022 06:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HCatalog is different from Hive? How?
Created 10-24-2018 11:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools — Pig, MapReduce — to more easily read and write data. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored HCatalog supports reading and writing files in any format for which a SerDe (serializer-deserializer) can be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile, and ORC file formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.
Created 12-10-2018 01:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's start with Hive and then HCatalog.
Hive
- Layer for analyzing, querying and managing large datasets that reside in Hadoop various file systems
⇢ uses HiveQL (HQL) as processing engine
⇢ uses SerDes for serialization and deserialization
⇢ works best with huge volumes of data
HCatalog
- Table and storage management layer for Hadoop
⇢ basically, the EDW system for Hadoop (it supports several file formats such as RCFile, CSV, JSON, SequenceFile, ORC)
⇢ is a sub-component of Hive, which enables ETL processes
⇢ tool for accessing metadata that reside in Hive Metastore
⇢ acts as an API to expose the metastore as REST interface to external tools such as Pig
⇢ uses WebHcat, a web server for engaging with the Hive Metastore
I think the focus has to be made on how they complement each other rather than focusing on their differences.
Documentation (3)
- This answer from @Scott Shaw is worth checking
- This slideshare presents the use cases and features of Hive and Hcatalog
- This direct graph from IBM shows how they use both layers in a batch job
I hope this helps! 🙂