Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

"client.RMProxy: Connecting to ResourceManager" error

avatar
Expert Contributor

Okay. So now I have another issue while executing MapReduce jobs. This same MapReduce was running fine yesterday, but now I am getting this error which mentions about ResourceManager. It keeps on trying for ever. I have done a lot of googling before posting this error. I have also restarted Resourcemanager service several times, but with no affect. ResourceManager is shown as green on the Dashboard. Following is the error.

16/02/08 20:33:19 INFO client.RMProxy: Connecting to ResourceManager at master.mydomain.com/172.26.180.6:8050 16/02/08 20:33:20 INFO ipc.Client: Retrying connect to server: master.mydomain.com/172.26.180.6:8050. Already tried 0 time(s);

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Pradeep kumar

I have seen this in my environment. As you can see , it's INFO

You have to wait and let the job execute the main processing.

Is job failing?

View solution in original post

8 REPLIES 8

avatar
Master Mentor
@Pradeep kumar

I have seen this in my environment. As you can see , it's INFO

You have to wait and let the job execute the main processing.

Is job failing?

avatar
Expert Contributor

The execution console has been giving this warning since the last 4 minutes, with no sign of any failure or successful execution of job. But, on thing I observed is that the "ResourceManager Service" is in red :(. Just to add to it, I do not see any log files under /hadoop/yarn/og folder. It is empty. Where should I look for logs if I have to find out what is going wrong?

avatar
Expert Contributor

@Neeraj Sabharwal I am running out of option. Can you suggest something?. I now see that the Resourcemanager service is running fine. It is green in colour, but when I run the jobs, it comes back with the same info about Resourcemanager being tried.. I have been waiting since last 10 minutes, but I am still seeing the same message being displayed "INFO ipc.Client: Retrying connect to server: mycomputer.mydomain.com/172.26.180.6:8050."

avatar
Master Mentor

@Pradeep kumar Whats the CPU and memory in your cluster? As mentioned earlier, you have to wait and see.

In the meantime, you should check namenode and yarn logs

ls -lrt /var/log/hadoop/hdfs/

ls -l /var/log/hadoop-yarn/yarn/

look for errors in the last updated file

avatar
Expert Contributor

@Neeraj Sabharwal Those log files were like best friends to me!. I found that the log for resourcemanager was showing zookeeper related issues. When I went to the dashboard, I saw that the Zookeeper service was not up. I run the Zookeeper service and then my Jobs started running fine!!!!! . I had not started the Zookeeper service deliberately as I thought it is not required for running a simple MapReduce Job. Maybe I was wrong. Do you have any technical explanation for this Neeraj. Thanks again!

avatar
Master Mentor

@Pradeep kumar

Add me on linkedin or my Twitter allaboutbdata

avatar
Master Mentor
@Pradeep kumar

I suggest you enable RM HA as if you encounter similar issues you can force failover to the other RM node or in case active RM fails, it will automatically fail over. Do netstat -tunlp | grep 8088 and ps aux | grep 8088. Investigate whether you have heavy memory utilization and look at tuning RM.

avatar
Expert Contributor

I will try for RM HA, but as another alternative, Is it a good idea trying to move ResourceManager to another node?. Because I have most of the services installed and running on the master node. Maybe the master node is too much loaded in terms of memory utilization?. From the graph on the dashboard, the memory utilization of master node is around 50% only.