Created 04-27-2024 12:31 AM
I've created a cluster in Cloudera and now I want to add Hive, but I need it to utilize the Tez execution engine. So, I'm adding Hive on Tez and the Tez service. However, I'm confused about assigning server IPs to roles.
In the Hive service section, I see options for gateway, hiveserver2, metastore, and WebHcat.
In the Hive on Tez section, I see options for gateway and hiveserver2.
In the Tez section, only gateway is listed.
Now, I'm confused. Should I assign the same IP to all gateways and hiveserver2, or should I assign different IPs? Where should I assign hiveserver2, in Hive or in Hive on Tez? And how many servers can I assign to hiverserver2, metastore, and gateway?
Moreover, when the Hive section covers gateway and hiverserver2 options, why do the Hive on Tez and Tez sections also show these options?
Additionally, I would appreciate clarification on how to configure Hive high availability in this setup. I'm seeking guidance to proceed and would appreciate any help from the Cloudera community.
Created 04-29-2024 10:48 PM
@omkar_gaikwad Assigning IPs to roles in Cloudera Manager can indeed be a bit confusing at first. Here's a breakdown to clarify:
Gateway: Typically, you assign only one IP to the gateway role. This serves as the entry point for client applications to interact with Hive and Tez services.
HiveServer2: This is the main interface for clients to submit queries to Hive. You can assign the same IP as the gateway for simplicity, or you can assign a separate IP if you have specific requirements for load balancing or isolation.
Metastore: This role manages the metadata for Hive tables. It's common to assign one IP for the metastore, but you can configure high availability for it if needed by assigning multiple IPs and enabling HA.
WebHCat: This role provides a REST API for submitting Hive and Pig jobs. Assign one IP for WebHCat. (WEBHCAT DEPRECATED)
For Hive on Tez:
Gateway: Assign the same IP as the gateway in the Hive service section. This ensures consistent access for clients.
HiveServer2: Assign the same IP as the HiveServer2 in the Hive service section. This keeps the HiveServer2 configuration centralized.
Regarding high availability (HA) for Hive:
Metastore: Configure high availability for the metastore by assigning multiple IPs and enabling HA in Cloudera Manager. This ensures that metadata remains available even if one metastore instance fails.
HiveServer2: Similarly, you can configure high availability for HiveServer2 by assigning multiple IPs and enabling HA.
Finally, regarding why options like gateway and HiveServer2 appear in multiple sections: Cloudera Manager organizes roles based on services, but some roles (like gateway and HiveServer2) are common across multiple services like Hive and Tez. This redundancy allows for flexibility in configuring and managing services.
In summary, assign IPs based on your specific requirements for load balancing, isolation, and high availability. Configure HA for critical components like the metastore and HiveServer2 to ensure uninterrupted operation.
Created 04-29-2024 10:48 PM
@omkar_gaikwad Assigning IPs to roles in Cloudera Manager can indeed be a bit confusing at first. Here's a breakdown to clarify:
Gateway: Typically, you assign only one IP to the gateway role. This serves as the entry point for client applications to interact with Hive and Tez services.
HiveServer2: This is the main interface for clients to submit queries to Hive. You can assign the same IP as the gateway for simplicity, or you can assign a separate IP if you have specific requirements for load balancing or isolation.
Metastore: This role manages the metadata for Hive tables. It's common to assign one IP for the metastore, but you can configure high availability for it if needed by assigning multiple IPs and enabling HA.
WebHCat: This role provides a REST API for submitting Hive and Pig jobs. Assign one IP for WebHCat. (WEBHCAT DEPRECATED)
For Hive on Tez:
Gateway: Assign the same IP as the gateway in the Hive service section. This ensures consistent access for clients.
HiveServer2: Assign the same IP as the HiveServer2 in the Hive service section. This keeps the HiveServer2 configuration centralized.
Regarding high availability (HA) for Hive:
Metastore: Configure high availability for the metastore by assigning multiple IPs and enabling HA in Cloudera Manager. This ensures that metadata remains available even if one metastore instance fails.
HiveServer2: Similarly, you can configure high availability for HiveServer2 by assigning multiple IPs and enabling HA.
Finally, regarding why options like gateway and HiveServer2 appear in multiple sections: Cloudera Manager organizes roles based on services, but some roles (like gateway and HiveServer2) are common across multiple services like Hive and Tez. This redundancy allows for flexibility in configuring and managing services.
In summary, assign IPs based on your specific requirements for load balancing, isolation, and high availability. Configure HA for critical components like the metastore and HiveServer2 to ensure uninterrupted operation.