Support Questions

Find answers, ask questions, and share your expertise

Yarn jar mapreduce-examples.jar pi 5 10 fails with socket timeout exception

avatar
Contributor

Hello,

I am running below command from the map reduce examples for PI, it is failing and I can see socket timeout exception in the logs.

I am not able to find a solution anywhere till now, would be glad if someone can help.

Command: yarn jar hadoop-mapreduce-examples.jar pi 5 10

(From the directory: /usr/hdp/2.3.0.0-2557/hadoop-mapreduce)

Below is the log trace:

2016-04-20 06:12:48,333 WARN [RMCommunicator Allocator] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.17.0.2:53751 remote=node1/172.17.0.2:8030]
2016-04-20 06:12:51,884 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.17.0.2:53751 remote=node1/172.17.0.2:8030]; Host Details : local host is: "node1/172.17.0.2"; destination host is: "node1":8030; 
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773

I can see the property in advances yarn site -- yarn.resourcemanager.scheduler.address node1:8030

Hosts file entry:

[root@node1 ~]# cat /etc/hosts 172.17.0.2 node1

127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters [root@node1 ~]#

Not sure what is the problem. I can ping localhost / node1/127.0.0.1 from node1 terminal.

Regards,

Vinay MP

1 ACCEPTED SOLUTION

avatar
Contributor

Finally I managed to get a new 16GB machine where I can run the VM with good performance.

As an initial practice I was using a 8GB machine.

I used the same VM, the command went through fine in 16GB machine and it failed in 8GB machine.

Not exactly sure whether memory wasn't sufficient (i didn't see any OOM / related exceptions in 8GB machine) to run these tests in 8GB machine but I am glad the problem is solved.

@Ian Roberts, @Predrag Minovic Thanks for taking time to reply.

Regards,

Vinay MP

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Hi @Vinay MP , It seems you are not able to contact the ResourceManager. What port is RM listening on? You should be able to do a ps -ef | grep resourcemanager and then do a netstat -tulpn | grep <PID> to find out.

avatar
Master Guru

In your /etc/hosts, move the line "172.17.0.2 node1" from the top to line number 2:

127.0.0.1	localhost
172.17.0.2	node1

Then, run "hostname", it should be "node1", if not run "hostname node1". Also check your hostname in /etc/sysconfig/network file. And finally, as Ian suggested check whether RM is up and running and listening on ports 8030 and 8050 (and a few other ones).

avatar
Contributor

Hi @Ian Roberts , @Predrag Minovic

Thanks for the suggestions. I will try them and update.

As per now I checked netstat and I was able to see resourcemanager was up and listening on 8030, 8050 and few more ports.

All of a sudden I am not able to open terminal session to Node1 (one of the host in my VM). I will fix that and verify the mapreduce example.

Regards,

Vinay MP

avatar
Contributor

Finally I managed to get a new 16GB machine where I can run the VM with good performance.

As an initial practice I was using a 8GB machine.

I used the same VM, the command went through fine in 16GB machine and it failed in 8GB machine.

Not exactly sure whether memory wasn't sufficient (i didn't see any OOM / related exceptions in 8GB machine) to run these tests in 8GB machine but I am glad the problem is solved.

@Ian Roberts, @Predrag Minovic Thanks for taking time to reply.

Regards,

Vinay MP