As per my understanding, Hive actions in oozie workflows are run with the same user who submitted the workflow. As such statements such as CREATE EXTERNAL TABLE t1 ... LOCATION /foo/bar/t1 will result in the necessary directory structure in HDFS being created with that user. This is what we normally observe. But in a few cases, I have seen that the directory is getting created with user "hive" which then results in all sorts of HDFS permission errors such as:
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=hive, access=WRITE, inode="/foo/bar":my_expected_user:my_expected_group:drwxr-xr-x
Has anyone come across this behaviour? Is it expected under some conditions or is a misconfiguration or a bug?
Hi Harsh, thanks for the follow up.
In my observation (on CDH 5.3.4), Hive actions are almost alway run as the user who submitted the Oozie workflow in Hue. And as such any directory created by the Hive table creation is also created by this user. But clearly sometimes it is the service "hive" user. Is it deterministic? i.e. given a Hive workflow action with a given script can I tell which user will end up running it??