Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (3)
avatar
Master Guru

Using Hive in Oozie can be challenging. There are two available actions

HiveAction and Hive2Action

Th Hive action uses the hive client and needs to set a lot of libraries and connections. I ran into a lot of issues especially related to security. Also the logs are not available in the Hive server and hive server settings are not honoured.

The Hive2 action is a solution for this. It runs a beeline command and connects through jdbc to the hive server. The below assumes that you have used LDAP or PAM security for your hive server.

<action name="myhiveaction">
  <hive2 xmlns="uri:oozie:hive2-action:0.1">
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <jdbc-url>jdbc:hive2://server:10000/default</jdbc-url>
    <password>${hivepassword}</password>
    <script>/data/sql/dayly.sql</script>
    <param>database=${database}</param>
    <param>day=${day}</param>
  </hive2>
  <ok to="end"/>
  <error to="kill"/>
</action>

The problem with this is that everybody who has access to the oozie logs has access to the password in the hivepassword parameter. This can be less than desirable. Luckily beeline provides a new function to use a password file. A file containing the hive password.

beeline -u jdbc:hive2://sandbox:10000/default -n user -w passfile

passfile being a file containing your password without any new lines at the end. Just the password.

To use that in the Action you can give it as an argument. However you still need to upload the passfile to the oozie execution folder. This can be done in two ways ( create a lib folder under your workflow directory and put it there or use the file argument.

<action name="myhiveaction">
  <hive2 xmlns="uri:oozie:hive2-action:0.1">
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <jdbc-url>jdbc:hive2://server:10000/default</jdbc-url>
    <script>/data/sql/dayly.sql</script>
    <param>database=${database}</param>
    <param>day=${day}</param>
    <argument>-wpassfile</argument>
    <file>/user/myuser/passfile#passfile</file>
  </hive2>
  <ok to="end"/>
  <error to="kill"/>
</action>

This will copy the password file to the temp directory and beeline will use it for authentication. Only the owner of the oozie workflow needs access to that file but other people can see the logs ( but not the password. )

Note: The Hive2Action seems to be weird about parameters. It is important to use

-wpassfile

not

-w passfile

The space will cause it to fail because it is adding the space to the filename. This is different for the command line beeline.

4,556 Views
Comments
avatar
Master Guru

Hi Ben, nice article! A question: Can we run Hive2 action when HS2 is running in http transport mode?

avatar
New Contributor

Hi,

As we suggested i implemented hive2 action with password file but iam getting below exception.

ERROR:

Error: E0701 : E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'argument'. One of '{"uri:oozie:hive2-action:0.1":file, "uri:oozie:hive2-action:0.1":archive}' is expected.

My worflow :

<workflow-app name="wf_Cred_Test"
xmlns="uri:oozie:workflow:0.4">
<start to="HivePartitionAction" />
<action name="HivePartitionAction">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<jdbc-url>${jdbcURL}</jdbc-url>
<script>script/addPartition.sql</script>
<param>runtimeEnvironment=${runtimeEnvironment}</param>
<file>script/addPartition.sql</file>
<argument>-wpassfile</argument>
<file>/tmp/dev/app/workflow/wf_Cred_Test/script/passfile#passfile</file>
</hive2>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>Action failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>