Created on 11-11-2015 04:20 PM - edited 09-16-2022 01:33 AM
Using Hive in Oozie can be challenging. There are two available actions
HiveAction and Hive2Action
Th Hive action uses the hive client and needs to set a lot of libraries and connections. I ran into a lot of issues especially related to security. Also the logs are not available in the Hive server and hive server settings are not honoured.
The Hive2 action is a solution for this. It runs a beeline command and connects through jdbc to the hive server. The below assumes that you have used LDAP or PAM security for your hive server.
<action name="myhiveaction"> <hive2 xmlns="uri:oozie:hive2-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <jdbc-url>jdbc:hive2://server:10000/default</jdbc-url> <password>${hivepassword}</password> <script>/data/sql/dayly.sql</script> <param>database=${database}</param> <param>day=${day}</param> </hive2> <ok to="end"/> <error to="kill"/> </action>
The problem with this is that everybody who has access to the oozie logs has access to the password in the hivepassword parameter. This can be less than desirable. Luckily beeline provides a new function to use a password file. A file containing the hive password.
beeline -u jdbc:hive2://sandbox:10000/default -n user -w passfile
passfile being a file containing your password without any new lines at the end. Just the password.
To use that in the Action you can give it as an argument. However you still need to upload the passfile to the oozie execution folder. This can be done in two ways ( create a lib folder under your workflow directory and put it there or use the file argument.
<action name="myhiveaction"> <hive2 xmlns="uri:oozie:hive2-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <jdbc-url>jdbc:hive2://server:10000/default</jdbc-url> <script>/data/sql/dayly.sql</script> <param>database=${database}</param> <param>day=${day}</param> <argument>-wpassfile</argument> <file>/user/myuser/passfile#passfile</file> </hive2> <ok to="end"/> <error to="kill"/> </action>
This will copy the password file to the temp directory and beeline will use it for authentication. Only the owner of the oozie workflow needs access to that file but other people can see the logs ( but not the password. )
Note: The Hive2Action seems to be weird about parameters. It is important to use
-wpassfile
not
-w passfile
The space will cause it to fail because it is adding the space to the filename. This is different for the command line beeline.
Created on 03-08-2016 08:12 AM
Hi Ben, nice article! A question: Can we run Hive2 action when HS2 is running in http transport mode?
Created on 12-19-2017 02:45 PM
Hi,
As we suggested i implemented hive2 action with password file but iam getting below exception.
ERROR:
Error: E0701 : E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'argument'. One of '{"uri:oozie:hive2-action:0.1":file, "uri:oozie:hive2-action:0.1":archive}' is expected.
My worflow :
<workflow-app name="wf_Cred_Test"
xmlns="uri:oozie:workflow:0.4">
<start to="HivePartitionAction" />
<action name="HivePartitionAction">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<jdbc-url>${jdbcURL}</jdbc-url>
<script>script/addPartition.sql</script>
<param>runtimeEnvironment=${runtimeEnvironment}</param>
<file>script/addPartition.sql</file>
<argument>-wpassfile</argument>
<file>/tmp/dev/app/workflow/wf_Cred_Test/script/passfile#passfile</file>
</hive2>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>Action failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>