Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to start Ambari-Infra in a HDF cluster due to Zeekeeper auth_fail

avatar
Contributor

I have enabled kerberos on the HDF cluster. When starting ambari-infra, it errors out due to zookeeper failure. I have confirmed that the jaas files are updated correctly, and I am able to kinit using both zk.service.keytab and ambari-infra-solr.service.keytab. When solrCloudCli.sh is invoked by Ambari, the following error is reported - "Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7)). org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /clusterprops.json".

I have attached the solr client logs.solor-error-log.txt

Thanks,

1 ACCEPTED SOLUTION

avatar
Contributor

It turned out to be a problem with the file permissions. The umask was not set to 022. Hence it was failing due to access for ambari-infra logs and configurations. The error message was incorrect, as it was pointing to kerberos error.

View solution in original post

9 REPLIES 9

avatar
@hello hadoop

What do the /etc/hosts files look like on your nodes? I had a similar issue, I had to put the FQDN of the nodes first in the /etc/hosts file on the nodes. For example, I had

12.34.56.78 node1 node1.domain

12.34.56.79 node2 node2.domain

12.34.56.77 node3 node3.domain

When I switched them to

12.34.56.78 node1.domain node1

12.34.56.79 node2.domain node2

12.34.56.77 node3.domain node3

Everything started just fine.

avatar
Contributor

Thank you @Wynner

I have the host files in the format you mention, with FQDN followed by shorter one. However, my hostname is set to shortname (node1) without domain. Would this be an issue?

avatar
@hello hadoop

My configuration was the other way, try switching yours to short name first.

avatar
Contributor

I tried both ways, but still the same error. Even zkCli.sh errors with Auth_Failed.

avatar
@hello hadoop

What version of HDF are you using?

avatar
Contributor

@Wynner

I am using HDF 2.1.1.0

avatar
Contributor

It turned out to be a problem with the file permissions. The umask was not set to 022. Hence it was failing due to access for ambari-infra logs and configurations. The error message was incorrect, as it was pointing to kerberos error.

avatar

Hi @hello hadoop, in which directory of file permission must be changed? I can't find clusterprops.json. Please help.

avatar
New Contributor

reversing FQDN and short names in hosts file worked for me.