Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Cloudera agent fails to start on machines due BIND server not up.

Explorer

Sometimes when all vms with Cloudera managed clusters are rebooted or successful installation is restored from snapshot on vms, Cloudera agent fails to start.

I analyzed the cloudera-scm-agent.out log and saw that the Cloudera agent was down on some of the machines, and it was due to the fact that the ordering of BIND and Cloudera Agent in the boot process isn’t strictly ordered. Since Cloudera Agent requires the local BIND server to exist (because Cloudera Agent queries for the hostname of the machine, and the configured DNS server in resolv.conf is
127.0.0.1) the agent failed to start on some machines. Is there a way to configure Cloudera Agent to fail after some specific number of retries and timeout?

0 REPLIES 0