- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
When to install Hadoop clients
- Labels:
-
Apache Hadoop
Created on ‎08-17-2016 05:36 PM - edited ‎09-16-2022 03:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a chart or other summary documentation on when it is necessary to install Hadoop clients on a specific host? What exactly does installing a client do, other than make sure that the config files are installed.
Created ‎08-17-2016 08:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Client applications binaries are also installed. They need to be capable to access remotely server processes. Some client processes may be also started.
For example, you may to perform some ETL jobs for that you need to have access to Hive or Pig client. You access these client applications from the edge nodes where they are installed. As a good practice for a production environment, clients should not be installed on data nodes nor name nodes. Ideally, you would use edge nodes. It is a matter of security and workload distribution. You don't want various users accessing directly data nodes or running scripts on data nodes or name nodes. Your access model will be much easier to setup and manage. Keep a good separation of concerns: access, management, processing, data.
Here is a real-life scenario: https://community.hortonworks.com/questions/39568/how-to-create-edge-node-for-kerberized-cluster.htm...
Created ‎08-17-2016 08:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Client applications binaries are also installed. They need to be capable to access remotely server processes. Some client processes may be also started.
For example, you may to perform some ETL jobs for that you need to have access to Hive or Pig client. You access these client applications from the edge nodes where they are installed. As a good practice for a production environment, clients should not be installed on data nodes nor name nodes. Ideally, you would use edge nodes. It is a matter of security and workload distribution. You don't want various users accessing directly data nodes or running scripts on data nodes or name nodes. Your access model will be much easier to setup and manage. Keep a good separation of concerns: access, management, processing, data.
Here is a real-life scenario: https://community.hortonworks.com/questions/39568/how-to-create-edge-node-for-kerberized-cluster.htm...
Created ‎08-17-2016 09:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jbarnett When you need to interface with the service (Hbase,hive,yarn,etc) then you decide to install the client node. typically you find in cluster setups you dedicate 1 node called "edge node" where you install all your client libraries. this then becomes your single entry point to run your services. you can add many edge node to scale out accordingly. as @Constantin Stanca explained it simply installed the client libraries for your specific version of hadoop and services. makes it very easy on end user. hope that helps.
