Created 05-03-2016 12:52 PM
Hello,
I am running below command from the map reduce examples for PI, it is failing and I can see socket timeout exception in the logs.
I am not able to find a solution anywhere till now, would be glad if someone can help.
Command: yarn jar hadoop-mapreduce-examples.jar pi 5 10
(From the directory: /usr/hdp/2.3.0.0-2557/hadoop-mapreduce)
Below is the log trace:
2016-04-20 06:12:48,333 WARN [RMCommunicator Allocator] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.17.0.2:53751 remote=node1/172.17.0.2:8030] 2016-04-20 06:12:51,884 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.17.0.2:53751 remote=node1/172.17.0.2:8030]; Host Details : local host is: "node1/172.17.0.2"; destination host is: "node1":8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773
I can see the property in advances yarn site -- yarn.resourcemanager.scheduler.address node1:8030
Hosts file entry:
[root@node1 ~]# cat /etc/hosts 172.17.0.2 node1
127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters [root@node1 ~]#
Not sure what is the problem. I can ping localhost / node1/127.0.0.1 from node1 terminal.
Regards,
Vinay MP
Created 06-03-2016 06:49 AM
Finally I managed to get a new 16GB machine where I can run the VM with good performance.
As an initial practice I was using a 8GB machine.
I used the same VM, the command went through fine in 16GB machine and it failed in 8GB machine.
Not exactly sure whether memory wasn't sufficient (i didn't see any OOM / related exceptions in 8GB machine) to run these tests in 8GB machine but I am glad the problem is solved.
@Ian Roberts, @Predrag Minovic Thanks for taking time to reply.
Regards,
Vinay MP
Created 05-03-2016 01:01 PM
Hi @Vinay MP , It seems you are not able to contact the ResourceManager. What port is RM listening on? You should be able to do a ps -ef | grep resourcemanager and then do a netstat -tulpn | grep <PID> to find out.
Created 05-03-2016 01:46 PM
In your /etc/hosts, move the line "172.17.0.2 node1" from the top to line number 2:
127.0.0.1 localhost 172.17.0.2 node1
Then, run "hostname", it should be "node1", if not run "hostname node1". Also check your hostname in /etc/sysconfig/network file. And finally, as Ian suggested check whether RM is up and running and listening on ports 8030 and 8050 (and a few other ones).
Created 05-04-2016 07:26 AM
Hi @Ian Roberts , @Predrag Minovic
Thanks for the suggestions. I will try them and update.
As per now I checked netstat and I was able to see resourcemanager was up and listening on 8030, 8050 and few more ports.
All of a sudden I am not able to open terminal session to Node1 (one of the host in my VM). I will fix that and verify the mapreduce example.
Regards,
Vinay MP
Created 06-03-2016 06:49 AM
Finally I managed to get a new 16GB machine where I can run the VM with good performance.
As an initial practice I was using a 8GB machine.
I used the same VM, the command went through fine in 16GB machine and it failed in 8GB machine.
Not exactly sure whether memory wasn't sufficient (i didn't see any OOM / related exceptions in 8GB machine) to run these tests in 8GB machine but I am glad the problem is solved.
@Ian Roberts, @Predrag Minovic Thanks for taking time to reply.
Regards,
Vinay MP