Created on 03-31-2018 04:25 AM - edited 09-16-2022 06:02 AM
Hello,
I am trying to install the Hadoop client components on a host that is not managed by Cloudera Manager. After doing some digging, some people suggested that simply installing the hadoop-client, and adding the site configuration files should do the trick.
But I can't find hadoop-client!
Here is yum repo:
[cloudera-manager] name = Cloudera Manager, Version 5.12.1 baseurl = https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.12.1/ gpgkey = https://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera gpgcheck = 1
And the output of yum:
$ sudo yum install hadoop-client Loaded plugins: fastestmirror ... cloudera-manager | 951 B 00:00:00 ... ... cloudera-manager/primary | 4.3 kB 00:00:00 ... ... cloudera-manager 7/7 No package hadoop-client available. Error: Nothing to do
Your help is appreciated.
Created 03-31-2018 01:37 PM
Hi @ramin,
Here are some general instructions I found internally. Note: You can change the path to match the OS release and CDH version of the client you need.
curl -O https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
baseurl=https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7.1/
$ yum clean all $ yum install hadoop-client
(See http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7.1/RPMS/x86_64/)
$ cp * /etc/hadoop/conf
$ sudo -u hdfs hadoop fs -ls
Note: You can also download the RPM file and install locally if desired.
Created 08-22-2018 08:29 AM
Question on this.
Does the edge node be configured to be able to do passwordless ssh to the namenode?
Created 08-22-2018 09:02 AM
@AKB,
No.
Your client will communicate with the NameNode itself over network. It does not need to authenticate to the host.
Created on 08-22-2018 09:12 AM - edited 08-22-2018 09:26 AM
I did the setup on a Centos7 host.
Get this error when I try to run command. Using AWS Elastic IP for the single node cluster, so public IP is in hosts file (edge node).
root@edgenode ~]# sudo -u hdfs hadoop fs -ls /ds-datalake
-ls: java.net.UnknownHostException: ip-172-31-26-58.ec2.internal
Usage: hadoop fs [generic options] -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]
Core-site.xml has this set (private ip):
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip-172-31-26-58.ec2.internal:8020</value>
</property>
Created 08-22-2018 11:01 AM
OK, I have fixed this issue by replacing the core-site.xml file IP address with the public one. That allows me to list hadoop dirs on cluster.
But read/write operations give errors like the following. Any ideas what config changes are needed on client side files to allow this to work?
[root@edgenode bin]# hadoop fs -put hdfs-clientconfig-aws.zip /ds-datalake/misc
18/08/22 13:00:28 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:2008)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1715)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1668)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:790)
18/08/22 13:00:28 WARN hdfs.DFSClient: Abandoning BP-49600184-172.31.26.58-1534798007391:blk_1073745239_4416
18/08/22 13:00:28 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[172.31.26.58:50010,DS-0c88ebaf-aa0b-407c-8b64-e02a02eeac3c,DISK]
18/08/22 13:00:28 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /ds-datalake/misc/hdfs-clientconfig-aws.zip._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3505)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:694)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:219)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
at org.apache.hadoop.ipc.Client.call(Client.java:1504)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:425)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1860)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1656)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:790)
put: File /ds-datalake/misc/hdfs-clientconfig-aws.zip._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
Created 08-22-2018 04:35 PM
@AKB,
Looks like your client cannot access DataNodes to write out blocks.
You could edit your /etc/hosts file on the client to map your private hostnames to the public hostnames (for all hosts in the cluster). That might work.
There may be a more elegant solution, but that should get you by if you can resolve the IPs ok.
Created 08-22-2018 04:40 PM
Thanks for the comment. Not sure how to do this private to public mapping. Any help is appreciated. Thanks.
This is what the hosts file on client looks like right now:
127.0.0.1 localhost.localdomain localhost
192.168.157.154 edgenode.local.com edgenode
18.215.25.118 ec2-18-215-25-118.compute-1.amazonaws.com ec2-18-215-25-118 <-- Public IP of server
Created 08-22-2018 04:43 PM
@AKB,
Find what IP address you can use to access a DataNode host.
Map that to the hostname of the host that is returned by "hostname -f" on that host.
Since the NameNode returns the DataNode hostname, you need to be sure your edge host can resolve that hostname to an IP that is reachable by your client.
Created 08-22-2018 08:08 PM
Thanks for the tip. That worked.
So, putting the public IP and private hostname in hosts file on client did the trick. 🙂