I had a CDH 5.4.4 Cluster with 1 cloudera manager, 2 name nodes and 7 data nodes.
The cloudera manager and 2 name nodes are dead and lost (no backup).
The 7 data nodes are intact.
Is there a procedure to recover/remount the HDFS and HBase from the existing data nodes by building a new cluster with new name nodes and joining the existing data nodes with their existing data?
If you don't have any fsimage stored somewhere, there is no way to recover as the NameNodes maintain the metadata necessary that tells what files the blocks belong to.
If you have no fsimage backups and all the NameNodes are dead, the blocks are useless.
Hopefully you have a way of ingesting the data again.
We have a similar problem but in our case we have the fsimage in the datanodes.
The thing is that after a reboot of the server where the cloudera manager, the primary namenode and more roles were running (master01 node) lost its configuration. The cmserver database was also running there and after this reboot it is empty. When we tried to access to the GUI of this node the installation wizard appears...
The cluster was formed by 2 master and 5 nodes and we were running HDFS, Yarn, HBase, Zookeeper services. We have removed /var/lib/cloudera-scm-agent/cm_guid file in all the nodes and restarted cloudera-scm-agent service and now the master01 node sees the rest of the hosts but without any role assgined.
We have a lot of data already imported in this cluster and we wouls like to recover it if possible...
How can we re-create the cluster without losing the already imported data? The datanodes and the namenodes seem to have the data stored, but we are afraid to install the HDFS service from scratch just in case this data will be removed.
Many thanks in advance.
Actually, this issue you describe is completely different than the original post.
In your case, you describe a situation where you rebooted your host and now when you access Cloudera Manager, the install wizard appears.
The CM database is on a real disk, so rebooting would not erase the data. It is possible, though, that CM was using a different database before the reboot and the act of rebooting has created a situation where CM is using a different database.
# ls -lrt /etc/cloudera-scm-server
If we see multiple db.properties files (with varying suffixes) that could indicate the situation I mentioned above.
For now, do not touch anything CDH till we can identify what happened. Also, when you said the database was blank after reboot, can you clarify how you determined that? What database were you using for Cloudera Manager? Is it running? Can you connect to it via command line tool and view databases/tables?
To me, it sounds like CDH hasn't changed and that CM is pointing to a blank database. If you were using your own database server, but then accidentally installed cloudera-manager-server-db-2 package and you are on an earlier release of CM, this could be the cause.
Rebuilding your cluster configuration in CM is time consuming, so it is worth the effort to find the cause.
There are several db.properties files:
[root@sciclouderamaster01 ~]# ls -lrt /etc/cloudera-scm-server
-rw-r--r-- 1 root root 2229 Nov 20 2018 log4j.properties
-rw------- 1 cloudera-scm cloudera-scm 714 Nov 20 2018 db.properties.~1~
-rw------- 1 cloudera-scm cloudera-scm 445 Mar 21 15:06 db.properties.~2~
-rw------- 1 cloudera-scm cloudera-scm 435 Mar 21 15:07 db.properties.~3~
-rw------- 1 cloudera-scm cloudera-scm 438 Jun 13 09:43 db.properties.~4~
-rw------- 1 cloudera-scm cloudera-scm 438 Jun 13 10:41 db.properties.~5~
-rw------- 1 cloudera-scm cloudera-scm 440 Jun 13 11:28 db.properties.~6~
-rw------- 1 cloudera-scm cloudera-scm 446 Jun 28 16:38 db.properties
The database is a mysql installed in the same server, when I tried to show the tables of cmserver database this is the output:
mysql> show tables in cmserver;
Empty set (0.00 sec)
So it's seemed that, as you mentioned, CM is pointing to a blanck database. cloudera-manager-server-db-2 is not installed on this server.
I agree with you that rebuilding the CM configuration is time consuming, and we can't afford to have this same issue in the future, so let's try to find the cause of this failure.
Many thanks for your help!
Assuming that your server was working fine till the reboot (on June 28?) then we should compare the db.propteries.* files for configuration.
If you can cat each one (and maybe redact the password) so we can identify what has changed over time, that should help give us some clues.
CM won't wipe out your CM database so if your database is actually gone/missing then that would indicate something outside the control of CM impacted the db.
I have checked the current db.properties file and it is the same as db.properties.~2~ (which was written in March, when the cluster was installed).
The server was working fine before the reboot although we had some issues with zookeeper (zookeeper java.lang.OutOfMemoryError: GC overhead limit exceeded), that was the reason of the reboot of the master server.
Inspecting the mysql folder, the folder were cmserver database was stored was modified the day after the reboot and only contains the db.opt file:
[root@sciclouderamaster01 mysql]# ls -l
-rw-rw---- 1 mysql mysql 56 Mar 21 12:51 auto.cnf
drwx------ 2 mysql mysql 20 Jun 28 16:38 cmserver
drwx------ 2 mysql mysql 4096 Jun 13 10:51 hive
drwx------ 2 mysql mysql 12288 Jun 13 12:56 hue
-rw-rw---- 1 mysql mysql 79691776 Jul 11 09:17 ibdata1
-rw-rw---- 1 mysql mysql 536870912 Jul 11 09:16 ib_logfile0
-rw-rw---- 1 mysql mysql 536870912 Jul 11 09:17 ib_logfile1
drwx------ 2 mysql mysql 4096 Mar 21 12:51 mysql
-rw-rw---- 1 mysql mysql 2366 Mar 21 13:15 mysql_binary_log.000001
-rw-rw---- 1 mysql mysql 42403592 Mar 22 12:00 mysql_binary_log.000002
-rw-rw---- 1 mysql mysql 143 Mar 22 12:33 mysql_binary_log.000003
-rw-rw---- 1 mysql mysql 29153612 Apr 1 15:29 mysql_binary_log.000004
-rw-rw---- 1 mysql mysql 52766506 Apr 8 09:19 mysql_binary_log.000005
-rw-rw---- 1 mysql mysql 267606242 Jun 27 17:06 mysql_binary_log.000006
-rw-rw---- 1 mysql mysql 40202225 Jun 27 18:04 mysql_binary_log.000007
-rw-rw---- 1 mysql mysql 30937 Jun 27 18:18 mysql_binary_log.000008
-rw-rw---- 1 mysql mysql 32653871 Jul 11 09:17 mysql_binary_log.000009
-rw-rw---- 1 mysql mysql 351 Jun 27 18:20 mysql_binary_log.index
srwxrwxrwx 1 mysql mysql 0 Jun 27 18:20 mysql.sock
drwx------ 2 mysql mysql 8192 Jun 27 17:08 oozie
drwx------ 2 mysql mysql 4096 Mar 21 12:51 performance_schema
On the other hand, we installed hive, oozie and hue databases on July the 11th and their databases are fine. The only database that we have lost is cmserver.
I had a snapshot copy of the VM that was made the day after the reboot and the files of cmserver database were from March (where the cluster was created) so I did not understand where the CM configuration was stored:
[root@sciclouderamaster01 ~]# ls -l /var/lib/mysql/
drwx------ 2 mysql mysql 20 Mar 21 15:06 cmserver
Is there any way of checking the configuration files of CM or everything is stored only in the cmserver db?
Many thanks in advance.
Sorry I missed your reply.
Everything is in the CM database.
Cloudera does not remove the contents of the database, so I don't know what would have hapened to your database files for CM.
Perhaps check command history on that host.
Check auditd log if you have it enabled...
Perhaps MySQL's community might be able to help with forensics.