Support Questions

Find answers, ask questions, and share your expertise

Data03 node is decomissioned, i want it back in working condition.

avatar

Hi Team,

From last few days, data03 node was not working, so we moved it to decomissioned state and restarted all our services. 
Now i need to recomission and start the services here as well.
I selected the host and under actions clicked on End maintenace, but it is failing.

Please help me out with what shall i do to resolve this?


clo.png
1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello,

Try to install the agent package on this host and let me know if that solves the issue.

  • Make a copy of your /etc/cloudera-scm-agent/config.ini.
  • Uninstall the cloudera-manager-agent package
  • Install the cloudera-manager-agent package
  • Copy the /etc/cloudera-scm-agent/config.ini back
  • Start the cloudera-scm-agent service

View solution in original post

17 REPLIES 17

avatar
Super Collaborator

It could be possible that the Datanode is still in decommissioning state. Can you start to Stop that Datanode from the UI and then try to exit the Maintenance mode and check.

avatar

Hi @rki_ ,
Thanks for your response.
I tried that stopping all roles on this host and tried to end maintenance but still it is giving the same error.
"Role not started due to unhealthy host data-03."

Also, the log file to which it is referring is also empty.

Any suggestions, what shall i do next?

 

avatar
Super Collaborator

Hi, Do you see any errors in the Cloudera-scm-agent logs? If the agent looses connection to the CM Server, it reports the host as unhealthy. Have you tried restarting the Cloudera-scm-agent on this host and check if that helps.

avatar

Hi,
I could see that cloudera-scm-agent logs on this host are till last month that time it showed like data03 node has been out of contact with the cloudera manager for too long.

Also when i had run the below command to check the state -

    service cloudera-scm-agent status
It is giving below error-
     "Unit cloudera-scm-agent.service could not be found."

Please help me with a way forward.

 

avatar

Please find attached the screenshot of Cloudera-scm-agent log haviing the errors related to impala.


datanode_error.jpg

avatar
Super Collaborator

Hi,

Can you check if agent package is indeed present on this host. You can compare it with a working host.

# rpm -qa | grep cloudera

 

avatar

Hi @rki_ ,
Yes cloudera-manager-agent package is missing on data-03 node.

avatar
Super Collaborator

Hello,

Try to install the agent package on this host and let me know if that solves the issue.

  • Make a copy of your /etc/cloudera-scm-agent/config.ini.
  • Uninstall the cloudera-manager-agent package
  • Install the cloudera-manager-agent package
  • Copy the /etc/cloudera-scm-agent/config.ini back
  • Start the cloudera-scm-agent service

avatar

Hi @rki_ ,

I am doing this for the very first time.
Can you help me with the detailed steps.