In a 6-node cluster managed by Cloudera Manager 5.3 I am trying to run an Oozie workflow.
The workflow is submitted by user hive.
The main action in the workflow is a shell action, which runs a remote query to another CDH
through beeline, transforms the output in some ways, and writes it to HDFS.
Then there are a couple of filesystem actions for deleting and renaming some files.
The workflow runs successfully, but the resulting file in HDFS is owned by user yarn
rather than user hive who sumitted the job.
I overrided the default empty value in the Oozie server advanced configuration snippet for oozie-site.xml to include:
<property>
<name>oozie.service.ProxyUserService.proxyuser.hive.hosts</name>
<value>*</value>
<property>
<name>oozie.service.ProxyUserService.proxyuser.hive.groups</name>
<value>*</value>
</property>
and I can verify that
hadoop.proxyuser.hive.hosts *
hadoop.proxyuser.hive.groups *
exist in hdfs service-wide / proxy configuration.
What am I doing wrong or should I do more to have oozie act as the job submitter
within the workflow?