Created 09-13-2021 01:59 PM
Hello,
We have one oozie server in our lower environment and I was assigned to add one more oozie server. I added one oozie server in ambari but oozie server installation stuck due to keytab/kerberos issues.
Error--> Execution of '/usr/bin/kinit -l 5m20s -c /var/lib/ambari-agent/tmp/oozie_alert_cc_7579 -kt /etc/security/keytabs/oozie.service.keytab oozie/FQDN; ' returned 1. kinit: Client 'oozie/FQDN' not found in Kerberos database while getting initial credentials
Any help? How to I fix this issue to add one more oozie server?
Thank in advance.
Created 09-13-2021 08:51 PM
Hi @Mdali ,
Based on the error message "'oozie/FQDN' not found in Kerberos database", looks like the oozie kerberos prinicpal creation failed. Could you check the Ambari server logs during the time you tried to add another Oozie server to identify the cause?
Thanks,
Prashanth Vishnu
Created 09-14-2021 06:49 AM
Hi, Thank you for reply. As I aborted the operation after 4 hours, there is not any std.err generated for this issue. I have checked the ambari-server log. During the installation time, all below messages are generated continuously. Looks like Ambari tried non stop to install but failed.
Any Idea?
StackAdvisorHelper:255 - Clear stack advisor caches, hosts: [FQDN]
2021-09-10 14:43:17,110 INFO [ambari-client-thread-381065] HostComponentResourceProvider:973 - Received a updateHostComponent request, clusterName=XXX_DEV, serviceName=OOZIE, componentName=OOZIE_SERVER, hostname=FQDN, request={ clusterName=XXX_DEV, serviceName=OOZIE, componentName=OOZIE_SERVER, hostname=FQDN publicHostname=null, desiredState=INSTALLED, state=null, desiredStackId=null, staleConfig=null, adminState=null, maintenanceState=null}
2021-09-10 14:43:17,110 INFO [ambari-client-thread-381065] HostComponentResourceProvider:697 - Handling update to host component, clusterName=XXX_DEV, serviceName=OOZIE, componentName=OOZIE_SERVER, hostname=FQDN, currentState=INIT, newDesiredState=INSTALLED
2021-09-10 14:43:17,353 INFO [ambari-action-scheduler] ServiceComponentHostImpl:1054 - Host role transitioned to a new state, serviceComponentName=OOZIE_SERVER, hostName=FQDN, oldState=INIT, currentState=INSTALLING
2021-09-10 14:43:17,364 INFO [ambari-action-scheduler] AgentCommandsPublisher:124 - AgentCommandsPublisher.sendCommands: sending ExecutionCommand for host FQDN, role OOZIE_SERVER, roleCommand INSTALL, and command ID 13530-0, task ID 56448
2021-09-10 14:44:07,441 ERROR [agent-report-processor-3] HeartbeatProcessor:516 - Operation failed - may be retried. Service component host: OOZIE_SERVER, host: FQDN Action id 13530-0 and taskId 56448
2021-09-10 14:44:07,442 INFO [agent-report-processor-3] ServiceComponentHostImpl:1054 - Host role transitioned to a new state, serviceComponentName=OOZIE_SERVER, hostName=FQDN, oldState=INSTALLING, currentState=INSTALL_FAILED
2021-09-10 14:44:07,742 INFO [ambari-action-scheduler] ActionDBAccessorImpl:227 - Aborting command. Hostname FQDN role KERBEROS_CLIENT requestId 13530 taskId 56450 stageId 2
2021-09-10 14:44:07,742 INFO [ambari-action-scheduler] ActionDBAccessorImpl:227 - Aborting command. Hostname FQDN role KERBEROS_CLIENT requestId 13530 taskId 56453 stageId 5
Created 09-15-2021 08:03 AM
Hi @Mdali ,
Could you ensure the KDC server is reachable from the Ambari server? If it isn't then it is possible that the tasks might get timed out.
# ping <KDC host>
# telnet <KDC host> 88
Also, check the ambari-server.log for the keyword "CreatePrincipalsServerAction", as ideally below are the messages you can expect when you add a oozie server to the cluster,
-------------
14 Sep 2021 03:03:23,511 INFO [Server Action Executor Worker 2577] KerberosServerAction:359 - Processing identities...
14 Sep 2021 03:03:23,518 INFO [Server Action Executor Worker 2577] CreatePrincipalsServerAction:205 - Processing principal, oozie/<FQDN>@HADOOP.COM
14 Sep 2021 03:03:23,921 INFO [Server Action Executor Worker 2577] KerberosServerAction:463 - Processing identities completed.
-------------
Thanks,
Prashanth Vishnu
Created 09-16-2021 06:12 AM
Hello Prashanth,
I was able to install the oozie, it was ambari- agent permission issues. But oozie server did not start up and running. I tried to start oozie server but it failed. It shows tar command is not getting executed which is executed by ambari automatically I believe. I tried to run manually tar command, but its failed.
exception:
gzip: stdin: unexpected end of file tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now
Execute[('tar', '-xvf', u'/usr/hdp/current/oozie-server/oozie-sharelib.tar.gz', '-C', u'/usr/hdp/current/oozie-server')] {'not_if': "ambari-sudo.sh su oozie -l -s /bin/bash -c 'ls /hadoop/var/run/oozie/oozie.pid >/dev/null 2>&1 && ps -p `cat /hadoop/var/run/oozie/oozie.pid` >/dev/null 2>&1' || test -f /usr/hdp/current/oozie-server/.hashcode && test -d /usr/hdp/current/oozie-server/share", 'sudo': True} Command failed after 1 tries
Any advise.
Thank you.
Created 09-16-2021 07:26 PM
Hi @Mdali ,
Maybe the file"/usr/hdp/current/oozie-server/oozie-sharelib.tar.gz" is corrupted? Could you try copying the file from the other Oozie server if its the same version? Then try restarting again and let us know how it goes.
Thanks,
Prashanth Vishnu
Created 09-17-2021 06:13 AM
Hi @pvishnu,
I already did the same thing as you mentioned above. But I got the same error.
Any further advise?
Thank you.