I recently upgraded a cluster to cdh6.1.0 and everything is working apart from my jobs scheduled in oozie. I've tried writing a new workflow and submitting it, and it is getting the same issue. It gets a hdfs delegation token from hdfs but initially it complained that it couldn't access the file in my hdfs home directory
2019-02-12 07:18:38,582 WARN org.apache.oozie.command.wf.ActionStartXCommand: SERVER[node-2.node.hadoop.svc.cluster.local] USER[jhoran] GROUP[-] TOKEN APP[My Workflow] JOB[0000001-190212060327045-oozie-oozi-W] ACTION[0000001-190212060327045-oozie-oozi-W@spark-b216] Error starting action [spark-b216]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: Permission denied: user=oozie, access=EXECUTE, inode="/user/xxx":xxx:xxx:drwxrwx---
I tried granting access explitily to oozie to see if that helped, and the script then runs. But it goes on to complain that it can't access any keytab.
2019-02-12 06:35:18,895 [Thread-13] ERROR org.apache.thrift.transport.TSaslTransport - SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
The oozie keytab appears to be correct, I can kinit against it and access hdfs. It's very possible that I mad a serious mistake during the upgrade process, I had to make a few changes at once such as debian-ubuntu, and changing the kerberos server to a centralized freeipa one, everything else seems to be working now though, and I'm really not sure where to even start with this one, so any help is appreciated.
I noticed that oozie could still create files using the filesystem actions in my /user/ directory, and that any files it created were still owned by my user, so clearly it was getting permissions but those permissions weren't enough to execute into directories. So I just gave oozie execute permission on all files in my /user directory. I'm sure this isn't the best solution, but it works for now.
I'm not sure what I did to resolve the second issue, but somehow that got cleaned up along the way.