Our cluster is secured using Kerberos. Now I need to run hive queries in a shell script which would be scheduled to run periodically.
In my shell script, I was thinking to use the below commands in sequence
>beeline -u"jdbc:hive2://$hive_server2:10000/$hive_db;principal=$user_principal" >beeline -e"SHOW DATABASES"
But then I realised that once I run the beeline -u command, it would take me to the beeline shell instead of being in the bash shell. So wondering how to get this sorted out. I need to use beeline -e command, but need to connect to the cluster first using kerberos principal. Any ideas whats the best way to handle this?
FYI, we are not using Oozie, but shell script with crontab scheduling.
[hdfs@ip-172-31-40-160 ~]$ cat b.sh
beeline << EOF
!connect jdbc:hive2://localhost:10000 hive hive
[hdfs@ip-172-31-40-160 ~]$ sh -x b.sh
Beeline version 1.2.1000.2.5.0.0-1245 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 hive hive
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 1.2.1000.2.5.0.0-1245)
Driver: Hive JDBC (version 1.2.1000.2.5.0.0-1245)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
| tab_name |
| edl_good_data |
| td_edl_good_data |
| test |
3 rows selected (0.152 seconds)
0: jdbc:hive2://localhost:10000> Closing: 0: jdbc:hive2://localhost:10000
Thanks @Neeraj Sabharwal Just wondering if there is any standard approach to this. Without connecting how does someone use hive -e command. i.e. it looks like hive -u and hive -e commands are mutually exclusive to me.
Although this is an old thread just thought it will useful for someone, we can use both in same line not sure if the older version of beeline didn't support this considering this is a old thread. Anyways the below syntax works for me.
beeline -u"jdbc:hive2://$hive_server2:10000/$hive_db;principal=$user_principal -e"SHOW DATABASES"
I needed to generate a file show all of the partitions for several tables. I ended up using the following with hive.
rm -f tables.txt
rm -f partitions.txt
" >> tables.txt
cat tables.txt |while read LINE
echo -e "$LINE\n" >> partitions.txt
hive -e "show partitions $LINE" >>partitions.txt
echo -e "\n" >> partitions.txt
rm -f tables.txt
echo "Show Partitions generated"
It seemed like hive didn't like something cause it threw a couple of errors at the beginning and end. But, ended up generating the desired output.
I'd prefer to use beeline but couldn't figure out how to integrate the beeline -u connection with the looping through the list of tables.
Can anyone suggest a similar approach using beeline?
Hive shell has security issues & is deprecated in higher versions of hdp please avoid. Opening a hive/beeline shell for every table it will be slow as it has to spawn a jvm for every table so avoid looping approach. Prepare a file with the table list like below.
show partitions table1;
show partitions table2;
show partitions table3;
Use a -f flag of beeline to pass the above file eg below.
beeline --silent=true --showHeader=false --outputformat=csv2 -u "jdbc:hive2://$hive_server2:10000/$hive_db;principal=$user_principal" -n '<username>' -p '<password>' -f show_partitions_tables.hql > show_partitions_tables.txt
Note: Please accept the answer if it solved your issue.
Your suggestion worked. At first, I wasn't sure about how to create the hql file. I ended up just using vi to create it with the "show partitions" statements.
Also, in my environment, I don't need to provide the credentials. I'm not sure why. I just provided the jdbc string for the zk cluster like the following:
beeline --silent=true --showHeader=false --outputformat=csv2 -u "jdbc:hive2://zk1:zkport,zk2:zkport,zk3:zkport/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -f show_partitions_tables.hql > show_partitions_tables.txt
Thanks for your help!