Created on 02-10-2016 06:27 AM - edited 09-16-2022 03:03 AM
Hello,
Actually, i try to install cloudera's distribution in 4 nodes on AWS EC2. I got fails in uploading Oozie Sharelib.
Please find below Role log:
rror: File /user/oozie/share/lib/lib_20160210140440/spark/mesos-0.21.1-shaded-protobuf.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1557) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3286) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:676) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) Stack trace for the error was (for debug purposes): -------------------------------------- org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/oozie/share/lib/lib_20160210140440/spark/mesos-0.21.1-shaded-protobuf.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1557) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3286) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:676) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy14.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy15.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1674) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1471) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:668)
Wed Feb 10 14:04:39 UTC 2016 JAVA_HOME=/usr/lib/jvm/java-7-oracle-cloudera using 5 as CDH_VERSION the destination path for sharelib is: /user/oozie/share/lib/lib_20160210140440
For info : Thoses errors are got in first execution. Before this step, in step of inspecting hosts, i got a message for datanodes :
IOException thrown while collecting data from host: connect timed out
Please find below my hosts file : Maybe it can help to resolve my problem :
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
#aws
10.0.0.33 ip-10-0-0-33.us-west-2.compute.internal
10.0.0.240 ip-10-0-0-240.us-west-2.compute.internal
10.0.0.241 ip-10-0-0-241.us-west-2.compute.internal
10.0.0.242 ip-10-0-0-242.us-west-2.compute.internal
(for aws structure is : private ip and private dns)
Could you please help me to resolve errors got on my installation ?
I wish having an answer. Im new in installation of cloudera's, i terminated several times my instances so that i can create a good install but...it's difficult :).
I look forward your replay.
Thank you in advance.
Best regards,
Created 02-11-2016 09:54 AM
It seems your data nodes has some issue, thats why you are not able to write the sharelib directory in hdfs. Please note that the HDFS file listings will work because you have presently working Namenode in your cluster.
Try to write some file in HDFS or create some directory and make sure you have working HDFS.
Created 02-11-2016 10:34 AM
Created 02-11-2016 01:49 PM
Hi,
Network flows yes good question 🙂 Hosts file is a good question also !! 🙂 I terminated instances and created new one because i had some problems with cluster (i deleted hosts, and cluster). You talked about network, so for that with my new instances, i enabled all TCP's and ICMP's (before only ports indicated in documentation were opened : 7180,7183,7432, ICMP, SSH, HTTP,HTTPS) for the moment, i disabled IPv6 and changed hostname file with my private DNS instead of my IP adresse for each machine. I subscribed private IP's and their private DNS's in hosts file.
For info : I created new vpc (amazon) with public subnet and attached that to my instances. I let public IP only for SSH
Sure !! problem of network !!
Now Cloudera is installed in 4 nodes. inspector display good health. My heart is also in good health now. My head also 🙂 🙂
Please find below actions done, maybe another person can have the same problem :
- Edit Hostname in sudo nano /etc/hostname(Ubuntu) and for RHEL sudo nano /etc/sysconfig/network by replacing IP address with private DNS.
- Edit hosts in sudo nano /etc/hosts for both Ubuntu and RHEL by adding Private IP addresses of number of Instances that you choose. If u choose 3 instances, enter the IP addresses of all the 3 Private IP’s and Private DNS’s but not the Public IP Address. The public IP or Hostname is only for connecting with SSH to the remote and for making WINSCP connection.
- Disable IPV6 by adding the following lines in sudo nano /etc/sysctl. conf
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
And issue the following command in order to make sure that it should contain 1 (after reboor)
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
Thank you a lot for your help. Sure i will come back with other questions 🙂
Could you please tell me what sould i activate as ports for security ? This part is managed in AWS.
Best regards,
Adil
Created 02-11-2016 01:33 AM
Hi,
One quick question I got. Is the hdfs is accessible ? at the time of this issue reported
Created 02-11-2016 05:28 AM
Hi,
I would like to thank you in advance about your intervention.
To answear to your question, i prepared some screeshots about :
- Access to HDFS from name node
- Ping from namenode to datanodes
- status of datanodes in cloudera manager (problem of integrity)
About access to hdfs, i can tape hdfs commands from namenode but not from datanodes. but i don't konw if datanodes react with namenode arount command hdfs. Please see my screenshot (from namenode):
As you can see, ping from namenode to one datanode is not working. you can see also that hdfs cammand return a result and oozie directory is created.
You can also notice on my tab of navigator, i cannot access to user interface of namenode.
Please find a second screenshot but this time is concerning status of my hosts (specially datanodes):
Datanodes are in bad integrity. there are also in same status in menu of hosts on cloudera manager.
I wish im clearly described my problem. Thank you for your help.
Please tell me if you want that i execute other commands forcheck so that can make you better understand what's happening in my system.
I look forward your reply. Thank you in advance.
Best regards,
Adil
Created 02-11-2016 06:09 AM
I would like also to add that maybe it can help :
Hosts are not detecting by inspector. Please find error :
IOException thrown while collecting data from host: connect timed out
This error is existing only for datanodes in my console.
Thank you for help.
Best regards,
Adil
Created 02-11-2016 08:26 AM
Hello Adil,
Just to understand your scenario better, you mean that your host inspection for master is fine but failing for data nodes. If that is correct then I would request you to please share host file from one of the working master & another from data node.
Thanks.
Created 02-11-2016 08:46 AM
Hi,
Please find hosts file in namenode :
ubuntu@ip-10-0-0-33:/etc$ more hosts
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
#aws
10.0.0.33 ip-10-0-0-33.us-west-2.compute.internal
10.0.0.240 ip-10-0-0-240.us-west-2.compute.internal
10.0.0.241 ip-10-0-0-241.us-west-2.compute.internal
10.0.0.242 ip-10-0-0-242.us-west-2.compute.internal
ubuntu@ip-10-0-0-33:/etc$
and find my hosts file in one of dataname:
ubuntu@ip-10-0-0-240:/etc$ more hosts
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
#aws
10.0.0.33 ip-10-0-0-33.us-west-2.compute.internal
10.0.0.240 ip-10-0-0-240.us-west-2.compute.internal
10.0.0.241 ip-10-0-0-241.us-west-2.compute.internal
10.0.0.242 ip-10-0-0-242.us-west-2.compute.internal
ubuntu@ip-10-0-0-240:/etc$
For info: Installation was notcompleted. It's stopped in first exucution because of oozie sharelib. but i can see the cluster. concerning hdfs, i think my namenode cannot communicate with datanodes as you noticed in results of inspection.
Thank you for help.
Best regards,
Adil
Created 02-11-2016 09:54 AM
It seems your data nodes has some issue, thats why you are not able to write the sharelib directory in hdfs. Please note that the HDFS file listings will work because you have presently working Namenode in your cluster.
Try to write some file in HDFS or create some directory and make sure you have working HDFS.
Created 02-11-2016 10:32 AM
I can create a directory but i cannot do a put. Problem of connection with datanodes.
Best regards,
Adil
Created 02-11-2016 10:34 AM
Hello Adil,
Are the network flows open?
Thanks.
Created 02-11-2016 01:49 PM
Hi,
Network flows yes good question 🙂 Hosts file is a good question also !! 🙂 I terminated instances and created new one because i had some problems with cluster (i deleted hosts, and cluster). You talked about network, so for that with my new instances, i enabled all TCP's and ICMP's (before only ports indicated in documentation were opened : 7180,7183,7432, ICMP, SSH, HTTP,HTTPS) for the moment, i disabled IPv6 and changed hostname file with my private DNS instead of my IP adresse for each machine. I subscribed private IP's and their private DNS's in hosts file.
For info : I created new vpc (amazon) with public subnet and attached that to my instances. I let public IP only for SSH
Sure !! problem of network !!
Now Cloudera is installed in 4 nodes. inspector display good health. My heart is also in good health now. My head also 🙂 🙂
Please find below actions done, maybe another person can have the same problem :
- Edit Hostname in sudo nano /etc/hostname(Ubuntu) and for RHEL sudo nano /etc/sysconfig/network by replacing IP address with private DNS.
- Edit hosts in sudo nano /etc/hosts for both Ubuntu and RHEL by adding Private IP addresses of number of Instances that you choose. If u choose 3 instances, enter the IP addresses of all the 3 Private IP’s and Private DNS’s but not the Public IP Address. The public IP or Hostname is only for connecting with SSH to the remote and for making WINSCP connection.
- Disable IPV6 by adding the following lines in sudo nano /etc/sysctl. conf
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
And issue the following command in order to make sure that it should contain 1 (after reboor)
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
Thank you a lot for your help. Sure i will come back with other questions 🙂
Could you please tell me what sould i activate as ports for security ? This part is managed in AWS.
Best regards,
Adil