Support Questions

Find answers, ask questions, and share your expertise

CDH 5.1 host IP address change

avatar
Rising Star

I have a CDH 5.1 cluster with 3 nodes. We installed it using cloudera manager automated installation. It was running perfect until we moved the box to a different network and IP addresses changed. I tried following steps

1. Stopped service, cloudera-scm-server.
2. Stopped service, cloudera-scm-agent
3. Edit the /etc/cloudera-scm-agent/config.ini
4. change the server host to the new ip.
5. restart service, cloudera-scm-agent, cloudera-scm-server.

not working .

Then i followed http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-latest/Cloudera-Manage... Not helped even after changing the ips in the PostgreSQL directly.

I found following blog : http://www.geovanie.me/changing-ip-of-node-in-cdh-cluster/

Getting following error in the scm-agent log file

ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>

No helped .... Can anyone please help how to change all IP addresses in a cdh 5.1 cluster safely .....

Thanks, Amit

1 ACCEPTED SOLUTION

avatar
Rising Star

I found the solution for the above issue 

1) for protocolError we need to use kill -9 to kill the superviser

2) the above mention steps are correct I was making some typo with the ip and hostnames in the config.ini in one of the hosts.

 

Hope that help....

 

Thanks,

Amit

 

 

View solution in original post

6 REPLIES 6

avatar
Rising Star

I found the solution for the above issue 

1) for protocolError we need to use kill -9 to kill the superviser

2) the above mention steps are correct I was making some typo with the ip and hostnames in the config.ini in one of the hosts.

 

Hope that help....

 

Thanks,

Amit

 

 

avatar
Expert Contributor
Hi
 
I recently returned my Comcast router and put my own. The IP addresses changed and my cluster is down 😞 
I am running 5.4.7CDH (starving developers version) on Ubuntu 12.04 LTS
1NN + 3DN
 
I followed these instructions and managed to get the IPs of the nodes corrected ojn the cluster (by updating HOSTS table in local cloudera db) 
 
 
I made sure that the UUIDs of each node as displayed in Cloudera manager is same as 
/var/lib/cloudera-scm-agent/uuid
 
Now My cloudera agents don't start on the nodes 
 
/var/log/cloudera-scm-agent/cloudera-scm-agent.out
[24/Jan/2016 13:37:38 +0000] 14519 MainThread agent        INFO     SCM Agent Version: 5.4.7
[24/Jan/2016 13:37:38 +0000] 14519 MainThread agent        INFO     Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/src/cmf/agent.py", line 3233, in <module>
    main()
  File "/usr/lib/cmf/agent/src/cmf/agent.py", line 3216, in main
    agent.configure()
  File "/usr/lib/cmf/agent/src/cmf/agent.py", line 428, in configure
    raise Exception("No server_host line was found in agent's configuration. One of --standalone or --master must be specified.")
Exception: No server_host line was found in agent's configuration. One of --standalone or --master must be specified.

avatar
New Contributor

 

Steps to find and kill the process:

1) Find the port which is used by supervisor: 
> ps aux |grep supervisor

2) kill the port

>sudo kill -9 <port number> 



avatar
New Contributor

Thank you for this excellent solution.

 

In my situation, I created 1 master and 4 slaves cloudera cluster using private IP with EC2 and then store then as AMI (images). I then created newinstances from these 5 AMI images, with all installations of cloudera on them.

 

Then I folllowed all your steps to change the IP to current private IPs.

 

However, when I finally check the state of the cloudera:

sudo service cloudera-scm-agent status sudo service cloudera-scm-master status both return active (exited) Then I checked the /var/log/cloudera-scm-agent/cloudera-scm-agent.log

 

And there are errors like this ERROR Failed to connect to previous supervisor.

ERROR Failed rack peer update: [Errno 111] Connection refused

ERROR 14794 Monitor-HostMonitor throttling_logger

ERROR Failed to collect NTP metrics

ERROR Unexpected exception during download

 

Do you know why this happens? ( I followed all yoursteps)

avatar
Master Guru

@jjjjjjhao,

 

The bits of errors provided don't tell enough of the story to indicate what may be wrong.

 

I would run: service cloudera-scm-agent restart  and then see what happens in the agent log. 

 

Also, what is the actual problem?  What is wrong in Cloudera Manager, etc.  It is unclear what you are trying to do or see and what actually happens.  Once that is clarified, the community can help.

 

Ben

avatar
New Contributor

Maybe I need to reinstall the cloudera manager?