Created on 03-31-2016 04:07 AM - edited 09-16-2022 03:11 AM
Hi,
Trying to call Hive commands inside shell scripts using oozie shell action, Shell script execute standalone but with oozie shell actionit fails to connect to hive metastore.
We are using CDH 5.4.8
Workflow.xml
<global> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapreduce.job.queue.name</name> <value>${queueName}</value> </property> </configuration> </global> <credentials> <credential name='hcat-creds' type='hcat'> <property> <name>hcat.metastore.uri</name> <value>thrift://hostname.net:9083</value> </property> <property> <name>hcat.metastore.principal</name> <value>hive/_HOST@myhostname.com</value> </property> </credential> </credentials> <start to="file-compaction-processing" /> <action name="file-compaction-processing" cred='hcat-creds'> <shell xmlns="uri:oozie:shell-action:0.3"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>${HiveOozieSiteXML}</job-xml> <exec>${wfPath}/hive_compaction.sh</exec> <env-var>HADOOP_USER_NAME=${wf:user()}</env-var> <file>${wfPath}/hive_compaction.sh#${wfPath}/hive_compaction.sh</file> <file>${HiveOozieSiteXML}#${HiveOozieSiteXML}</file> <capture-output /> </shell> <ok to="decision-file-compacted" /> <error to="fail" /> </action>
hive_compaction.sh
hive -e "show databases"
But oozie always fails with the below error when we use Hive cli ,
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true, username = MyUser. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: java.sql.SQLException: Failed to create database '/var/lib/hive/metastore/metastore_db'
When we use the beeline we are getting the kerberos auth problem, Can anyone please guide whether the approach of accessing hive through shell action works or not.
Created 04-08-2016 08:34 AM
Thanks for the response , I have passed the keytb file in shell action section of workflow that solved the issue.
Workflow.xml <file>${keytabaccount}#${keytabaccount}</file>
In script :
kinit ${keytabaccount}@xxx.xxx.com-k -t ${keytabaccount}.keytab
The above approach resolved the problem, Where i can access hive tables in shell script through hive2 beeline.
Thanks
Created 03-31-2016 02:49 PM
It took me a few moments, but it looks like you are doing something that I also ran into problems with initially, which is to execute a hive command through a shell action. I never found a resolve for this and thought that pursuing that course was going a bit against the purpose of the actions themselves (shell versus hive). I think eventually you could perform some hacks to get it working, it may not be a preferred path.
The approach I took to resolve this was to adhere to the practices that Oozie lays out for me, instead of trying to bend it to the way I wanted to do it, which was to create a cool wrapper script that would dynamically execute whatever I passed in. I'm getting a bit off topic on this ,but if you want to do dynamic stuff like that, it might be better to dynamically create the hql out in a file, then dynamically create the workflow content of an already existing statically-called workflow next (again... I'm digressing so I'll stop)
Try using a Hive action and create a separate hiveQL file with the SQL that you are wanting to perform. Otherwise, it looks like you were hitting on all of the right things to include, like the hive-site.xml, credentials, etc.
<global> <job-xml>${wfJobSite}</job-xml> </global> [...] <action name="hive-action" cred="hcat"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <script>/directory-path/example.hql</script> </hive> <ok to="End"/> <error to="Kill"/> </action> [...]
I don't know for sure what is happening behind the scenes when Oozie executes a shell script and you try to run a Hive command, but my hypothesis is that when the script executes out on a data node, it loses the context of the credentials and details you pass in, and unless you expressly leverage those somehow in your script, it results in the script not executing hive actions properly. A friend of mine was able to get a script to execute sqoop but only after he copied JARs/files out to every data node. I don't recommend trying to bend Hadoop to your will like that, but instead try to leverage the tools as they were designed and perhaps commit to the tools to eventually bend them to your will. 🙂 [just my thoughts]
Created 04-08-2016 08:34 AM
Thanks for the response , I have passed the keytb file in shell action section of workflow that solved the issue.
Workflow.xml <file>${keytabaccount}#${keytabaccount}</file>
In script :
kinit ${keytabaccount}@xxx.xxx.com-k -t ${keytabaccount}.keytab
The above approach resolved the problem, Where i can access hive tables in shell script through hive2 beeline.
Thanks