Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

When to install Hadoop clients

avatar
Expert Contributor

Is there a chart or other summary documentation on when it is necessary to install Hadoop clients on a specific host? What exactly does installing a client do, other than make sure that the config files are installed.

1 ACCEPTED SOLUTION

avatar
Super Guru

@jbarnett

Client applications binaries are also installed. They need to be capable to access remotely server processes. Some client processes may be also started.

For example, you may to perform some ETL jobs for that you need to have access to Hive or Pig client. You access these client applications from the edge nodes where they are installed. As a good practice for a production environment, clients should not be installed on data nodes nor name nodes. Ideally, you would use edge nodes. It is a matter of security and workload distribution. You don't want various users accessing directly data nodes or running scripts on data nodes or name nodes. Your access model will be much easier to setup and manage. Keep a good separation of concerns: access, management, processing, data.

Here is a real-life scenario: https://community.hortonworks.com/questions/39568/how-to-create-edge-node-for-kerberized-cluster.htm...

View solution in original post

2 REPLIES 2

avatar
Super Guru

@jbarnett

Client applications binaries are also installed. They need to be capable to access remotely server processes. Some client processes may be also started.

For example, you may to perform some ETL jobs for that you need to have access to Hive or Pig client. You access these client applications from the edge nodes where they are installed. As a good practice for a production environment, clients should not be installed on data nodes nor name nodes. Ideally, you would use edge nodes. It is a matter of security and workload distribution. You don't want various users accessing directly data nodes or running scripts on data nodes or name nodes. Your access model will be much easier to setup and manage. Keep a good separation of concerns: access, management, processing, data.

Here is a real-life scenario: https://community.hortonworks.com/questions/39568/how-to-create-edge-node-for-kerberized-cluster.htm...

avatar
Master Guru

@jbarnett When you need to interface with the service (Hbase,hive,yarn,etc) then you decide to install the client node. typically you find in cluster setups you dedicate 1 node called "edge node" where you install all your client libraries. this then becomes your single entry point to run your services. you can add many edge node to scale out accordingly. as @Constantin Stanca explained it simply installed the client libraries for your specific version of hadoop and services. makes it very easy on end user. hope that helps.