Created 06-09-2017 04:05 PM
Hello,
I have restarted or reinstalled cloudera-scm-agent, it generates a different uuid, and created duplicate hostname in the cloudera manager, one hostname with existing rules but has no hearburst, another duplicate hostname has heartburst with no rules.
cahive-master02 | 10.21.33.47 | 5.67s ago | 0.00 0.00 0.00 | 35 GiB / 94 GiB | 248.5 MiB / 9.3 GiB | 0 B / 5.9 GiB |
cahive-master02 | Cluster 1 | 10.21.33.47 | 10 Role(s) |
I noticed two have a different machine id. I deleted the first hostname as shown above, stoped cloudera-scm-agent and changed /var/lib/cloudera-scm-agent/uuid to the same machine id of another hostname. then restart cloudera-scm-agent, I still saw two duplicate hostname shows up with the same machine id. is there a way to fix it?
I tried to delete all of them and re-join. but i have no luck. I eneded up with the following error when removing them from cluster:
"Hosts Decommission
java.lang.IllegalStateException"
Thanks
Created 06-12-2017 08:15 AM
Duplicate hosts are usually caused when the /var/lib/cloudera-scm-agent/uuid file has been edited or removed.
To fix this issue you can try the following example:
Let’s say:
HostA is the original/correct host with Roles on it.
HostB is the new duplicate/incorrect host.
From the Cloudera Manager UI navigate to the Host page (CM -> Hosts -> “All Hosts”)
1. Identify original UUID
Click on HostA -> Under “Details” record the “Host ID”. This is the UUID. Let’s call it “HostA-UUID” (it usually is a long alpha numeric number).
2. login to the Host from command line and navigate to var/lib/cloudera-scm-agent/
3. Stop the agent
service cloudera-scm-agent stop
4. Make a backup of the existing uuid file.
5. Change the contents of the uuid file with the original uuid (i.e HostA-UUID from step 1)
echo -n “HostA-UUID” > /var/lib/cloudera-scm-agent/uuid
For example:
In my local env my UUID/Host ID is f7b2231c-dbd0-47ff-bf09-7961080cc065:
echo –n “f7b2231c-dbd0-47ff-bf09-7961080cc065” > /var/lib/cloudera-scm-agent/uuid
6. Make sure you have not introduced a carriage return by running cat on the uuif file:
cat uuid
(your command line prompt should be on the same line as the restored uuid.
For example:
[mytestenv] cat uuid
f7b2231c-dbd0-47ff-bf09-7961080cc065[mytestenv]
7. Start the Agent:
service cloudera-scm-agent
8. From the Host page in Cloudera manager, check box HostB (the bad duplicate host with no roles) -> select the “Actions for Selected” drop down -> Delete
Created 06-11-2017 03:51 PM
I saw the similar issue posted last year.
There is no solution for this issue.
What I found:
1) I cannnot decommision as the aboive user said since I got the error: java.lang.IllegalStateException
2) I fixed the wrong host id and even the host id is the same, it still shows two duplicate hostname in a different cluster
Anybody has any suggestions here? thanks
Created 06-12-2017 08:15 AM
Duplicate hosts are usually caused when the /var/lib/cloudera-scm-agent/uuid file has been edited or removed.
To fix this issue you can try the following example:
Let’s say:
HostA is the original/correct host with Roles on it.
HostB is the new duplicate/incorrect host.
From the Cloudera Manager UI navigate to the Host page (CM -> Hosts -> “All Hosts”)
1. Identify original UUID
Click on HostA -> Under “Details” record the “Host ID”. This is the UUID. Let’s call it “HostA-UUID” (it usually is a long alpha numeric number).
2. login to the Host from command line and navigate to var/lib/cloudera-scm-agent/
3. Stop the agent
service cloudera-scm-agent stop
4. Make a backup of the existing uuid file.
5. Change the contents of the uuid file with the original uuid (i.e HostA-UUID from step 1)
echo -n “HostA-UUID” > /var/lib/cloudera-scm-agent/uuid
For example:
In my local env my UUID/Host ID is f7b2231c-dbd0-47ff-bf09-7961080cc065:
echo –n “f7b2231c-dbd0-47ff-bf09-7961080cc065” > /var/lib/cloudera-scm-agent/uuid
6. Make sure you have not introduced a carriage return by running cat on the uuif file:
cat uuid
(your command line prompt should be on the same line as the restored uuid.
For example:
[mytestenv] cat uuid
f7b2231c-dbd0-47ff-bf09-7961080cc065[mytestenv]
7. Start the Agent:
service cloudera-scm-agent
8. From the Host page in Cloudera manager, check box HostB (the bad duplicate host with no roles) -> select the “Actions for Selected” drop down -> Delete
Created 06-12-2017 02:57 PM
Elias,
it now works with your solution. echo -n do the trick. thanks so much.
I have another issue with oozie, I may have deleted the oozie role from hadoop when decommission server.
Now I need to add the role back and it asks me the database username and password.
where do I find my previous username and password for oozie? or it doesn't matter if I create a new set of database username and password for oozie.
Created 06-12-2017 03:36 PM
That is great news! thank you for the update.
For your Oozie question:
If you do not know the password, you create a new oozie database as per our docuemtnation (Mysql for example[1]).
Then add the oozie service:
From Cloudera Manager's home screen, click on the Cluster drop down and select "Add Service" and follow the wizard.
[1] https://www.cloudera.com/documentation/enterprise/5-7-x/topics/cm_ig_mysql.html#id_ijy_cwt_g5
Created 06-12-2017 05:32 PM
Hi Elias,
I just reset the mysql password to get oozie work , thanks for the link.
there is one more issue since then. I cannot start the zookeeper servers. I got the following error when I tried to start 3 zookeepers.
"Starting these new ZooKeeper Servers may cause the existing ZooKeeper Datastore to be lost. Try again after restarting any existing ZooKeeper Servers with outdated configurations. If you do not want to preserve the existing Datastore, you can start each ZooKeeper Server from its respective Status page. "
it is ok not to preserve the existing datastore. but I go to the host's status page, I didnt find the link or the location that can allow me to start the zookeeper server manually. do you have any suggestion?
Thanks
Created 06-12-2017 07:06 PM
go to machine's Status page, Click Action, it has only two options clickable : "Initialize" and "enter maintenance mode", unable to click "start/stop" etc options. I clicked "initialize", still I am unable to start the zookeeper.
Thanks