We are facing a critical issue that we lost our Cloudera Manager configuration DB (It is a separate Oracle DB) and Kerberos server, so we cannot start Cloudera Manager server now but I think HDSF service should be still there, we can update the config to disable use Kerberos and start HDSF service to find back our data.
When I try to start the HDSF cluster through command line manually, found that we don’t have service “hadoop-hdfs-namenode”, “hadoop-hdfs-datanode” and other “hadoop-hdfs-*” in init.d directory, we only have “cloudera-scm-agent” and “cloudera-scm-server” in init.d:
If Cloudera Manager is managing your cluster, it is needed to start your services.
This is because Cloudera Manager creates all the configuration files for your services and then uses shell scripts to start the process. Without Cloudera Manager, you will not have an up-to-date set of configuration files for the process. In essence, there is no "easy" way to start your HDFS service.
If you lost your Cloudera Manager databse and have no means for recovery, it is possible to create a new one, but it will require a gread deal of work. You would need to add the same services and roles that are on each host. The agents should still be heartbeating to Cloudera Manager, so CM should allow you to select those Managed Hosts and add them to your cluster. You would need to make sure all your HDFS disk paths are correct and add back any configuration items as you had them before.
For a frame of reference, you can look at existing process directories. When CM starts a process it will lay down the configuration files in
You can look at all the configuration files for each Role (process) there to guide you.
If you are installing Cloudera Manager on a new host, you will need to update the "server_host" configuration in the /etc/cloudera-scm-agent/config.ini file for ayour agents.
If you really need to get the cluster running right away before you can start rebuilding the Cloudera Manager information, let us know. There is is a way to do it from the command line, but it is a bit tricky and unproven.
1. I extract the hosts and roles configuration as json file;
2. Update the json file host name as the existing CDH hosts;
3. Import the json file into CM;
4. Update all /etc/cloudera-scm-agent/config.ini server_host to the new CM;
One challenge for me is that we cannot remember all the roles in each host clearly, because we installed all services (include HDFS, Hbase, Hive, Impala, Zookeeper, Spark, Hue…etc.) before, but for HDFS and Hbase should be ok since our host name is like namenode01, namenode02, datanode01, datanode02, datanode03, datanode04…
Would like to seek the help that:
1. In the new install CM, if some roles config are not match the exiting CDH, any impact? Especially to the data;
2. Can I install the new CM only and point the host into the existing CDH during the install process, I am afraid the new CM will re-install the CDH and have some impact the existing ones;
BTW, it would be very appreciated if you can share the way to start HDFS without CM as your mentioned before.