Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CDH 5 fails to upload Oozie ShareLib on Ubuntu 14.04

avatar
Explorer

Hello,

 

Actually, i try to install cloudera's distribution in 4 nodes on AWS EC2. I got fails in uploading Oozie Sharelib.

 

Please find below Role log:

rror: File /user/oozie/share/lib/lib_20160210140440/spark/mesos-0.21.1-shaded-protobuf.jar could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1557)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3286)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:676)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)


Stack trace for the error was (for debug purposes):
--------------------------------------
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/oozie/share/lib/lib_20160210140440/spark/mesos-0.21.1-shaded-protobuf.jar could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1557)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3286)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:676)
	at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
	at org.apache.hadoop.ipc.Client.call(Client.java:1403)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
	at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1674)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1471)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:668)

 

Wed Feb 10 14:04:39 UTC 2016
JAVA_HOME=/usr/lib/jvm/java-7-oracle-cloudera
using 5 as CDH_VERSION
the destination path for sharelib is: /user/oozie/share/lib/lib_20160210140440

 

For info :  Thoses errors are got in first execution. Before this step, in step of inspecting hosts, i got a message for datanodes :

IOException thrown while collecting data from host: connect timed out

Please find below my hosts file : Maybe it can help to resolve my problem :

 

127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
#aws
10.0.0.33 ip-10-0-0-33.us-west-2.compute.internal
10.0.0.240 ip-10-0-0-240.us-west-2.compute.internal
10.0.0.241 ip-10-0-0-241.us-west-2.compute.internal
10.0.0.242 ip-10-0-0-242.us-west-2.compute.internal

 

 

(for aws structure is : private ip and private dns)

 

Could you please help me to resolve errors got on my installation ? 

 

I wish having an answer. Im new in installation of cloudera's, i terminated several times my instances so that i can create a good install but...it's difficult :).

 

I look forward your replay.

 

Thank you in advance.

 

Best regards,

 

3 ACCEPTED SOLUTIONS

avatar
Expert Contributor

It seems your data nodes has some issue, thats why you are not able to write the sharelib directory in hdfs. Please note that the HDFS file listings will work because you have presently working Namenode in your cluster.

 

Try to write some file in HDFS or create some directory and make sure you have working HDFS.

Thanks,
Sathish (Satz)

View solution in original post

avatar
Expert Contributor

Hello Adil,

 

Are the network flows open?

 

Thanks.

View solution in original post

avatar
Explorer

Hi,

 

Network flows yes good question 🙂 Hosts file is a good question also !! 🙂  I terminated instances and created new one because i had some problems with cluster (i deleted hosts, and cluster). You talked about network, so for that with my new instances, i enabled all TCP's and ICMP's (before only ports indicated in documentation were opened : 7180,7183,7432, ICMP, SSH, HTTP,HTTPS) for the moment, i disabled IPv6 and changed hostname file with my private DNS instead of my IP adresse for each machine. I subscribed private IP's and their private DNS's in hosts file.

 

For info : I created new vpc (amazon) with public subnet and attached that to my instances. I let public IP only for SSH

 

Sure !! problem of network !!

 

Now Cloudera is installed in 4 nodes. inspector display good health. My heart is also in good health now. My head also 🙂 🙂

 

Please find below actions done, maybe another person can have the same problem :

 

- Edit Hostname in sudo nano /etc/hostname(Ubuntu) and for RHEL sudo nano /etc/sysconfig/network by replacing IP address with private DNS.

- Edit hosts in sudo nano /etc/hosts for both Ubuntu and RHEL by adding Private IP addresses of  number of  Instances that you choose. If u choose 3 instances, enter the IP addresses of all the 3 Private IP’s and Private DNS’s but not the Public IP Address. The public IP  or Hostname is only for connecting with SSH to the remote and for making WINSCP connection.

- Disable IPV6 by adding the following lines in sudo nano /etc/sysctl. conf

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

And issue the following command in order to make sure that it should contain 1 (after reboor)

cat /proc/sys/net/ipv6/conf/all/disable_ipv6

 

Thank you a lot for your help. Sure i will come back with other questions 🙂

 

Could you please tell me what sould i activate as ports for security  ? This part is managed in AWS.

 

Best regards,

Adil

 

View solution in original post

9 REPLIES 9

avatar
Expert Contributor

Hi,

 

One quick question I got. Is the hdfs is accessible ? at the time of this issue reported 

 

Thanks,
Sathish (Satz)

avatar
Explorer

Hi,

 

I would like to thank you in advance about your intervention.

 

To answear to your question, i prepared some screeshots about :

 

- Access to HDFS from name node

- Ping from namenode to datanodes

- status of datanodes in cloudera manager (problem of integrity)

 

About access to hdfs, i can tape hdfs commands from namenode but not from datanodes. but i don't konw if datanodes react with namenode arount command hdfs. Please see my screenshot (from namenode):

 

 

access_hdfs.jpg

 

As you can see, ping from namenode to one datanode is not working. you can see also that hdfs cammand return a result and oozie directory is created.

You can also notice on my tab of navigator, i cannot access to user interface of namenode.

 

Please find a second screenshot but this time is concerning status of my hosts (specially datanodes):

 

status_hdfs.jpg

 

Datanodes are in bad integrity. there are also in same status in menu of hosts on cloudera manager.

 

I wish im clearly described my problem. Thank you for your help.

 

Please tell me if you want that i execute other commands forcheck so that can make you better understand what's happening in my system.

 

I look forward your reply. Thank you in advance.

 

Best regards,

Adil

avatar
Explorer

I would like also to add that maybe it can help :

 

Hosts are not detecting by inspector. Please find error :

 

Runs the host inspector on a single host. The inspector limits checks to those against other hosts in the same cluster as this host.
IOException thrown while collecting data from host: connect timed out

 

This error is existing only for datanodes in my console.

 

Thank you for help.

 

Best regards,

Adil

avatar
Expert Contributor

Hello Adil,

 

Just to understand your scenario better, you mean that your host inspection for master is fine but failing for data nodes. If that is correct then I would request you to please share host file from one of the working master & another from data node.

 

Thanks.

avatar
Explorer

Hi,

 

Please find hosts file in namenode :

ubuntu@ip-10-0-0-33:/etc$ more hosts
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
#aws
10.0.0.33 ip-10-0-0-33.us-west-2.compute.internal
10.0.0.240 ip-10-0-0-240.us-west-2.compute.internal
10.0.0.241 ip-10-0-0-241.us-west-2.compute.internal
10.0.0.242 ip-10-0-0-242.us-west-2.compute.internal
ubuntu@ip-10-0-0-33:/etc$

 

 

 

and find my hosts file in one of dataname:

 

ubuntu@ip-10-0-0-240:/etc$ more hosts
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
#aws
10.0.0.33 ip-10-0-0-33.us-west-2.compute.internal
10.0.0.240 ip-10-0-0-240.us-west-2.compute.internal
10.0.0.241 ip-10-0-0-241.us-west-2.compute.internal
10.0.0.242 ip-10-0-0-242.us-west-2.compute.internal
ubuntu@ip-10-0-0-240:/etc$

 

For info: Installation was notcompleted. It's stopped in first exucution because of oozie sharelib. but i can see the cluster. concerning hdfs, i think my namenode cannot communicate with datanodes as you noticed in results of inspection.

 

Thank you for help.

 

 

Best regards,

Adil

avatar
Expert Contributor

It seems your data nodes has some issue, thats why you are not able to write the sharelib directory in hdfs. Please note that the HDFS file listings will work because you have presently working Namenode in your cluster.

 

Try to write some file in HDFS or create some directory and make sure you have working HDFS.

Thanks,
Sathish (Satz)

avatar
Explorer

I can create a directory but i cannot do a put. Problem of connection with datanodes.

Best regards,

Adil

avatar
Expert Contributor

Hello Adil,

 

Are the network flows open?

 

Thanks.

avatar
Explorer

Hi,

 

Network flows yes good question 🙂 Hosts file is a good question also !! 🙂  I terminated instances and created new one because i had some problems with cluster (i deleted hosts, and cluster). You talked about network, so for that with my new instances, i enabled all TCP's and ICMP's (before only ports indicated in documentation were opened : 7180,7183,7432, ICMP, SSH, HTTP,HTTPS) for the moment, i disabled IPv6 and changed hostname file with my private DNS instead of my IP adresse for each machine. I subscribed private IP's and their private DNS's in hosts file.

 

For info : I created new vpc (amazon) with public subnet and attached that to my instances. I let public IP only for SSH

 

Sure !! problem of network !!

 

Now Cloudera is installed in 4 nodes. inspector display good health. My heart is also in good health now. My head also 🙂 🙂

 

Please find below actions done, maybe another person can have the same problem :

 

- Edit Hostname in sudo nano /etc/hostname(Ubuntu) and for RHEL sudo nano /etc/sysconfig/network by replacing IP address with private DNS.

- Edit hosts in sudo nano /etc/hosts for both Ubuntu and RHEL by adding Private IP addresses of  number of  Instances that you choose. If u choose 3 instances, enter the IP addresses of all the 3 Private IP’s and Private DNS’s but not the Public IP Address. The public IP  or Hostname is only for connecting with SSH to the remote and for making WINSCP connection.

- Disable IPV6 by adding the following lines in sudo nano /etc/sysctl. conf

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

And issue the following command in order to make sure that it should contain 1 (after reboor)

cat /proc/sys/net/ipv6/conf/all/disable_ipv6

 

Thank you a lot for your help. Sure i will come back with other questions 🙂

 

Could you please tell me what sould i activate as ports for security  ? This part is managed in AWS.

 

Best regards,

Adil