Member since
02-24-2019
9
Posts
3
Kudos Received
0
Solutions
12-22-2018
03:57 AM
Pretty good insight
... View more
12-10-2018
01:58 AM
2 Kudos
Let's start with Hive and then HCatalog. Hive
Layer for analyzing, querying and managing large datasets that reside in Hadoop various file systems ⇢ uses HiveQL (HQL) as processing engine ⇢ uses SerDes for serialization and deserialization ⇢ works best with huge volumes of data HCatalog
Table and storage management layer for Hadoop ⇢ basically, the EDW system for Hadoop (it supports several file formats such as RCFile, CSV, JSON, SequenceFile, ORC) ⇢ is a sub-component of Hive, which enables ETL processes ⇢ tool for accessing metadata that reside in Hive Metastore ⇢ acts as an API to expose the metastore as REST interface to external tools such as Pig ⇢ uses WebHcat, a web server for engaging with the Hive Metastore I think the focus has to be made on how they complement each other rather than focusing on their differences. Documentation (3)
This answer from @Scott Shaw is worth checking This slideshare presents the use cases and features of Hive and Hcatalog This direct graph from IBM shows how they use both layers in a batch job I hope this helps! 🙂
... View more
11-09-2018
11:28 AM
Hello Amir, please check the below resources: Article Thread Also check this answer from @Sean Roberts which is worth reading. Could you please update me if that helped.
... View more