Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

install hadoop client on unmanaged host

avatar
Explorer

Hello,

 

I am trying to install the Hadoop client components on a host that is not managed by Cloudera Manager. After doing some digging, some people suggested that simply installing the hadoop-client, and adding the site configuration files should do the trick.

 

But I can't find hadoop-client!

 

Here is yum repo:

 

 

[cloudera-manager]
name = Cloudera Manager, Version 5.12.1
baseurl = https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.12.1/
gpgkey = https://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera
gpgcheck = 1

 

 

And the output of yum:

 

 

$ sudo yum install hadoop-client
Loaded plugins: fastestmirror
...
cloudera-manager                               |  951 B  00:00:00     
...
...
cloudera-manager/primary                       | 4.3 kB  00:00:00     
...
...
cloudera-manager                               7/7
No package hadoop-client available.
Error: Nothing to do

Your help is appreciated.

10 REPLIES 10

avatar
Master Guru

Hi @ramin,

 

Here are some general instructions I found internally. Note: You can change the path to match the OS release and CDH version of the client you need.

 

  1. On the external host download the CDH repo file to the /etc/yum.repos.d/ directory:
    curl -O https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
  2. Edit the base URL in the cloudera-cdh5.repo file to install the CDH version (otherwise it will install the latest). For example, to install the 5.7.1 hadoop-client, update the baseurl to:
    baseurl=https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7.1/
  3. Install the hadoop-client rpm:
    $ yum clean all
    $ yum install hadoop-client

       (See http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7.1/RPMS/x86_64/)

  1. In Cloudera Manager navigate to, HDFS -> "Actions" drop down -> "Download Client Configuration" (this will download a zip file called hdfs-clientconfig.zip).
  2. Move the zip file over to the external host and unzip it.
  3. Copy all the unzipped configuration files to /etc/hadoop/conf. Example:
    $ cp *  /etc/hadoop/conf
  4. Run hadoop commands. Example:
    $ sudo -u hdfs hadoop fs -ls

Note: You can also download the RPM file and install locally if desired.

 

avatar
Contributor

Question on this.

 

Does the edge node be configured to be able to do passwordless ssh to the namenode?

avatar
Master Guru

@AKB,

 

No.

Your client will communicate with the NameNode itself over network.  It does not need to authenticate to the host.

avatar
Contributor

I did the setup on a Centos7 host.

 

Get this error when I try to run command. Using AWS Elastic IP for the single node cluster, so public IP is in hosts file (edge node).

 

root@edgenode ~]# sudo -u hdfs hadoop fs -ls /ds-datalake
-ls: java.net.UnknownHostException: ip-172-31-26-58.ec2.internal
Usage: hadoop fs [generic options] -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]

 

Core-site.xml has this set (private ip):

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip-172-31-26-58.ec2.internal:8020</value>
</property>

 

@bgooley

avatar
Contributor

OK, I have fixed this issue by replacing the core-site.xml file IP address with the public one. That allows me to list hadoop dirs on cluster.

 

But read/write operations give errors like the following. Any ideas what config changes are needed on client side files to allow this to work?

 

[root@edgenode bin]# hadoop fs -put hdfs-clientconfig-aws.zip /ds-datalake/misc
18/08/22 13:00:28 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:2008)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1715)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1668)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:790)
18/08/22 13:00:28 WARN hdfs.DFSClient: Abandoning BP-49600184-172.31.26.58-1534798007391:blk_1073745239_4416
18/08/22 13:00:28 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[172.31.26.58:50010,DS-0c88ebaf-aa0b-407c-8b64-e02a02eeac3c,DISK]
18/08/22 13:00:28 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /ds-datalake/misc/hdfs-clientconfig-aws.zip._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3505)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:694)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:219)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)

at org.apache.hadoop.ipc.Client.call(Client.java:1504)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:425)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1860)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1656)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:790)
put: File /ds-datalake/misc/hdfs-clientconfig-aws.zip._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

avatar
Master Guru

@AKB,

 

Looks like your client cannot access DataNodes to write out blocks.

You could edit your /etc/hosts file on the client to map your private hostnames to the public hostnames (for all hosts in the cluster).  That might work.

 

There may be a more elegant solution, but that should get you by if you can resolve the IPs ok.

 

 

avatar
Contributor

Thanks for the comment. Not sure how to do this private to public mapping. Any help is appreciated. Thanks.

 

This is what the hosts file on client looks like right now:

127.0.0.1 localhost.localdomain localhost
192.168.157.154 edgenode.local.com edgenode
18.215.25.118 ec2-18-215-25-118.compute-1.amazonaws.com ec2-18-215-25-118    <-- Public IP of server

avatar
Master Guru

@AKB,

 

Find what IP address you can use to access a DataNode host.

Map that to the hostname of the host that is returned by "hostname -f" on that host.

Since the NameNode returns the DataNode hostname, you need to be sure your edge host can resolve that hostname to an IP that is reachable by your client.

 

 

 

avatar
Contributor

@bgooley

 

Thanks for the tip. That worked.

 

So, putting the public IP and private hostname in hosts file on client did the trick. 🙂