Reply
New Contributor
Posts: 4
Registered: ‎01-04-2018

Unable to start cloudera-scm-agent

Did the Path A installation.

 

Unable to start cloudera-scm-agent it is frequently going down and giving the below error

 

ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>
[04/Jan/2018 19:21:48 +0000] 30365 MainThread agent ERROR Failed to connect to newly launched supervisor. Agent will exit
[04/Jan/2018 19:21:48 +0000] 30365 MainThread agent INFO Stopping agent...
[04/Jan/2018 19:21:48 +0000] 30365 MainThread agent INFO No extant cgroups; unmounting any cgroup roots
[04/Jan/2018 19:21:48 +0000] 30365 Dummy-1 daemonize WARNING Stopping daemon.

 

Tried restarting the DB, Server as well as Agent it is still giving the same problem.

Followed the scripts and uninstalled the CM and installed it again using Path A however issue still persist.

 

Would request an assistance on this.

New Contributor
Posts: 4
Registered: ‎01-04-2018

Re: Unable to start cloudera-scm-agent

So that issue is resolved by rebooting the particular node as it was reinstallation and post uninstalling the CM a reboot is required.

 

did init 6 which resolved the issue.

 

Thanks

Amir

Highlighted
Posts: 1,035
Topics: 1
Kudos: 258
Solutions: 128
Registered: ‎04-22-2014

Re: Unable to start cloudera-scm-agent

@Amir,

 

I insall and uninstall Cloudera Manager and agents often and don't reboot, so I think there was something else amiss on the host.  I agree rebooting was reasonable in this case, but it is not required when uninstalling Cloudera Manager or agent packages.

 

I see the following at the top of the log you provided:

 

ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>

 

It indicates that there was a version of the supervisor running that had a different password than the agent was using to connect to the supervisor.  This can happen if the agent failed to connect to the supervisor previously and then refreshed the supervisord.conf configuration with a new password.

 

Long story short, if this happens again, you might try using "ps" to detect any running supervisors and then use "kill" on the pid... escalate to "kill -9" if that does not work.

 

After the previous supervisor is gone, the agent should be able to create a supervisor process with the password in the agent's supervisord.conf so the agent can connect to it with that password.

 

The reboot of the system killed the old supervisor processes, so that's probably the part of rebooting that helped solve the issue.

 

Ben

Announcements