- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
install hadoop client on unmanaged host
- Labels:
-
Manual Installation
Created on ‎03-31-2018 04:25 AM - edited ‎09-16-2022 06:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to install the Hadoop client components on a host that is not managed by Cloudera Manager. After doing some digging, some people suggested that simply installing the hadoop-client, and adding the site configuration files should do the trick.
But I can't find hadoop-client!
Here is yum repo:
[cloudera-manager] name = Cloudera Manager, Version 5.12.1 baseurl = https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.12.1/ gpgkey = https://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera gpgcheck = 1
And the output of yum:
$ sudo yum install hadoop-client Loaded plugins: fastestmirror ... cloudera-manager | 951 B 00:00:00 ... ... cloudera-manager/primary | 4.3 kB 00:00:00 ... ... cloudera-manager 7/7 No package hadoop-client available. Error: Nothing to do
Your help is appreciated.
Created ‎03-31-2018 01:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ramin,
Here are some general instructions I found internally. Note: You can change the path to match the OS release and CDH version of the client you need.
- On the external host download the CDH repo file to the /etc/yum.repos.d/ directory:
curl -O https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
- Edit the base URL in the cloudera-cdh5.repo file to install the CDH version (otherwise it will install the latest). For example, to install the 5.7.1 hadoop-client, update the baseurl to:
baseurl=https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7.1/
- Install the hadoop-client rpm:
$ yum clean all $ yum install hadoop-client
(See http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7.1/RPMS/x86_64/)
- In Cloudera Manager navigate to, HDFS -> "Actions" drop down -> "Download Client Configuration" (this will download a zip file called hdfs-clientconfig.zip).
- Move the zip file over to the external host and unzip it.
- Copy all the unzipped configuration files to /etc/hadoop/conf. Example:
$ cp * /etc/hadoop/conf
- Run hadoop commands. Example:
$ sudo -u hdfs hadoop fs -ls
Note: You can also download the RPM file and install locally if desired.
Created ‎08-22-2018 08:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Question on this.
Does the edge node be configured to be able to do passwordless ssh to the namenode?
Created ‎08-22-2018 09:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@AKB,
No.
Your client will communicate with the NameNode itself over network. It does not need to authenticate to the host.
Created on ‎08-22-2018 09:12 AM - edited ‎08-22-2018 09:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did the setup on a Centos7 host.
Get this error when I try to run command. Using AWS Elastic IP for the single node cluster, so public IP is in hosts file (edge node).
root@edgenode ~]# sudo -u hdfs hadoop fs -ls /ds-datalake
-ls: java.net.UnknownHostException: ip-172-31-26-58.ec2.internal
Usage: hadoop fs [generic options] -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]
Core-site.xml has this set (private ip):
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ip-172-31-26-58.ec2.internal:8020</value>
</property>
Created ‎08-22-2018 11:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, I have fixed this issue by replacing the core-site.xml file IP address with the public one. That allows me to list hadoop dirs on cluster.
But read/write operations give errors like the following. Any ideas what config changes are needed on client side files to allow this to work?
[root@edgenode bin]# hadoop fs -put hdfs-clientconfig-aws.zip /ds-datalake/misc
18/08/22 13:00:28 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:2008)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1715)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1668)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:790)
18/08/22 13:00:28 WARN hdfs.DFSClient: Abandoning BP-49600184-172.31.26.58-1534798007391:blk_1073745239_4416
18/08/22 13:00:28 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[172.31.26.58:50010,DS-0c88ebaf-aa0b-407c-8b64-e02a02eeac3c,DISK]
18/08/22 13:00:28 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /ds-datalake/misc/hdfs-clientconfig-aws.zip._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1719)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3505)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:694)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:219)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
at org.apache.hadoop.ipc.Client.call(Client.java:1504)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:425)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1860)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1656)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:790)
put: File /ds-datalake/misc/hdfs-clientconfig-aws.zip._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
Created ‎08-22-2018 04:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@AKB,
Looks like your client cannot access DataNodes to write out blocks.
You could edit your /etc/hosts file on the client to map your private hostnames to the public hostnames (for all hosts in the cluster). That might work.
There may be a more elegant solution, but that should get you by if you can resolve the IPs ok.
Created ‎08-22-2018 04:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the comment. Not sure how to do this private to public mapping. Any help is appreciated. Thanks.
This is what the hosts file on client looks like right now:
127.0.0.1 localhost.localdomain localhost
192.168.157.154 edgenode.local.com edgenode
18.215.25.118 ec2-18-215-25-118.compute-1.amazonaws.com ec2-18-215-25-118 <-- Public IP of server
Created ‎08-22-2018 04:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@AKB,
Find what IP address you can use to access a DataNode host.
Map that to the hostname of the host that is returned by "hostname -f" on that host.
Since the NameNode returns the DataNode hostname, you need to be sure your edge host can resolve that hostname to an IP that is reachable by your client.
Created ‎08-22-2018 08:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the tip. That worked.
So, putting the public IP and private hostname in hosts file on client did the trick. 🙂
