CDH Express 6.3.1 - Can't start HDFS after cluster crash and apparently not clean namenode failover

FrozenWave — Fri, 18 Oct 2024 16:58:34 GMT

Hi, here is a summary of our blocking situation on our CDH Express 6.3.1 installation:

- A crash happened on the storage where all the virtual nodes composing our Cluster insist

- After solving the issue on the storage, our Cloudera cluster started again working, but with a critical error on HDFS Service Nodename

- In fact, we have High Availability configured on 2 nodes, and exploring the situation we saw that Namenode 2 was active and Namenode 1 was DOWN

- But the cluster worked

- In an attempt to fix the issue, we tried several times to manually start the Namenode 1 from "HDFS --> Instances --> federation --> Namenode1 --> Start" but it was impossible to start manually

- So, we decided to disable High Availability but the new configuration has not completed and actually is in pending status

- It's in fact impossible to start HDFS service at all because an error pops up saying that "Nameservice nameservice1 has no SecondaryNameNode or High-Availability partner"

It is interesting to note that (see attached picture) apparently we don't have High Availability configured anymore but cluster does not start anyhow

As you can see from the second screenshot, if we try to manually start HDFS service a configuration issue is shown and it still believes to be running in High Availability mode and still looking for a "partner node"

QUESTION:

- Is there a way to force a NON-High-availability state and start the HDFS Service?

Thank you for any advice and/or insights

Re: CDH Express 6.3.1 - Can't start HDFS after cluster crash and apparently not clean namenode failover

FrozenWave — Fri, 18 Oct 2024 21:15:05 GMT

Solved by:

- Deleting the Namenode role entirely from "HDFS --> Instances --> NameNode", clicking the checkbox near the Namenode Instance and selecting "delete" from the "Actions" dropdown menu

- Redeploying a new "Namenode" role on the same host where the primary Namenode was previously running

- Reenabling High Availability on the Namenode

- Reconstructing the Namenode Metadata in the "Hive" service, under "Hive --> actions --> rebuild Namenode metadata"

question CDH Express 6.3.1 - Can't start HDFS after cluster crash and apparently not clean namenode failover in Support Questions

CDH Express 6.3.1 - Can't start HDFS after cluster crash and apparently not clean namenode failover

Re: CDH Express 6.3.1 - Can't start HDFS after cluster crash and apparently not clean namenode failover